Use Fiat Crypto for non-x86_64 platforms, like BoringSSL. Continue
using the nistz256 code on Windows, differently from BoringSSL.
Make *ring* more consistent with BoringSSL.
Originally, when we imported fiat-crypto field operations, the pipeline
was in early stages and the generated code had to be manually integrated
with the rest of the curve implementation, so we moved all our
supporting code to third_party/fiat for simplicity. Over time more
supporting code, like the table generators, landed there to be next to
its callers.
fiat-crypto now generates standalone files which we #include into the
supporting code. This moves the supporting code back to the usual
location. It also updates the README.md file to reflect the new
pipeline. (Most of it was a documentation of the old pipeline, which was
much more manual.)
Change-Id: I64db7235feb6566f0d3cd4db3a7146050edaf25a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40904
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
I missed a symbol.
Change-Id: I83c6828620a54eaab26cad08b1714402a2758fc0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40905
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This file gets #included into other files, so we shouldn't use generic
names like 'fe'. This will let us import other fiat-crypto curves in the
future, if we want them.
Change-Id: Ie4e222729bde7e4ccd368b86fb9048a2ea4a58ac
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40824
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This pulls in the latest upstream files and reapplies our value_barrier patch.
In particular, https://github.com/mit-plv/fiat-crypto/pull/723 made 64-bit
P-256 faster!
32-bit x86, gcc
Before:
Did 3150 ECDH P-256 operations in 4027477us (782.1 ops/sec)
Did 9912 ECDSA P-256 signing operations in 4067212us (2437.1 ops/sec)
Did 3772 ECDSA P-256 verify operations in 4059197us (929.2 ops/sec)
Did 74800 Ed25519 key generation operations in 4020883us (18602.9 ops/sec)
Did 74000 Ed25519 signing operations in 4001827us (18491.6 ops/sec)
Did 21371 Ed25519 verify operations in 4024606us (5310.1 ops/sec)
Did 78000 Curve25519 base-point multiplication operations in 4051574us (19251.8 ops/sec)
Did 25133 Curve25519 arbitrary point multiplication operations in 4063280us (6185.4 ops/sec)
After:
Did 3250 ECDH P-256 operations in 4025179us (807.4 ops/sec) [+3.2%]
Did 10277 ECDSA P-256 signing operations in 4084926us (2515.8 ops/sec) [+3.2%]
Did 3895 ECDSA P-256 verify operations in 4048734us (962.0 ops/sec) [+3.5%]
Did 74480 Ed25519 key generation operations in 4002460us (18608.6 ops/sec) [+0.0%]
Did 74000 Ed25519 signing operations in 4004425us (18479.6 ops/sec) [-0.1%]
Did 21756 Ed25519 verify operations in 4038856us (5386.7 ops/sec) [+1.4%]
Did 78000 Curve25519 base-point multiplication operations in 4031991us (19345.3 ops/sec) [+0.5%]
Did 25133 Curve25519 arbitrary point multiplication operations in 4064925us (6182.9 ops/sec) [-0.0%]
x86_64, clang, OPENSSL_SMALL
Before:
Did 20090 ECDH P-256 operations in 4019408us (4998.2 ops/sec)
Did 56000 ECDSA P-256 signing operations in 4004370us (13984.7 ops/sec)
Did 23562 ECDSA P-256 verify operations in 4062283us (5800.2 ops/sec)
Did 127000 Ed25519 key generation operations in 4005053us (31709.9 ops/sec)
Did 128000 Ed25519 signing operations in 4021902us (31825.7 ops/sec)
Did 71000 Ed25519 verify operations in 4036015us (17591.6 ops/sec)
Did 132000 Curve25519 base-point multiplication operations in 4002101us (32982.7 ops/sec)
Did 93000 Curve25519 arbitrary point multiplication operations in 4023827us (23112.3 ops/sec)
After:
Did 22263 ECDH P-256 operations in 4005099us (5558.7 ops/sec) [+11.2%]
Did 61000 ECDSA P-256 signing operations in 4024810us (15156.0 ops/sec) [+8.4%]
Did 27426 ECDSA P-256 verify operations in 4038547us (6791.1 ops/sec) [+17.1%]
Did 128000 Ed25519 key generation operations in 4015033us (31880.2 ops/sec) [+0.5%]
Did 127000 Ed25519 signing operations in 4003894us (31719.1 ops/sec) [-0.3%]
Did 70000 Ed25519 verify operations in 4017446us (17424.0 ops/sec) [-1.0%]
Did 132000 Curve25519 base-point multiplication operations in 4006282us (32948.3 ops/sec) [-0.1%]
Did 93000 Curve25519 arbitrary point multiplication operations in 4025190us (23104.5 ops/sec) [-0.0%]
Change-Id: I2f705772899c701480ca0e0885e6b75dd1bb1f5d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40746
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
It is tricky to create EC_FELEMs right now. This will simplify making them.
Change-Id: Icde518efed61381004e2e5569a45d653af48ca2a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40585
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This fixes two issues. First, we have been lax about whether the
low-level inversion functions fail on zero input or output zero. Fix the
documentation and call the latter inv0 or inverse0 to match the
terminology used in draft-irtf-cfrg-hash-to-curve. (Although we may not
actually need inv0 given the optimization in D.2.)
This has no actual effect because the functions were only used in
contexts where the inputs were already guaranteed to be non-zero. Still,
we should be consistent here.
Second, ec_scalar_inv_montgomery and ec_scalar_inv_montgomery_vartime
claim to perform the same operation, but they do not. First, one
computed inv0 and the other computed inv (except only in some
implementations, so fix it to be consistent). Second, the former
computes inverses in the Montgomery domain, while the latter converts to
the Montgomery domain and then inverts. Rename it to
ec_scalar_to_montgomery_inv_vartime, which is... questionably
understandable but at least looks different.
Change-Id: I9b4829ce5013bdb9528078a093f41b1b158df265
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40526
Reviewed-by: Adam Langley <agl@google.com>
In doing so, add an IgnoreAllInstructions option for FileTest. FileTest
tracks unused fields so test drivers don't accidentally miss a portion
of the test. Wycheproof tests, however, have many different key formats
in instructions. These are tedious to list out, so add an option to
ignore them, on the assumption that checking attributes is more useful
than instructions.
Change-Id: I01cc9f3a95577d576c8c2dd68f5092fceb3215b1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39191
Reviewed-by: Adam Langley <agl@google.com>
It's unclear to me whether the normal rsa_signature_test.txt file is
still needed with these, but I've left it in for now.
Change-Id: I6db9c9556820263c2b0bc37d144e6403b9a7a178
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39189
Reviewed-by: Adam Langley <agl@google.com>
This CL updates the JSON conversion to preserve the flags. A
WycheproofResult now captures both "result" and "flags". An "acceptable"
test case's validity is determined by its flags. By default, we consider
an "acceptable" case as invalid, but a test driver may mark some of them
as valid by listing the flags as a parameter.
Previously, some Wycheproof tests (I think it was x25519_tests.txt?) did
not contain enough information to resolve this unambiguously. This has
since been fixed.
This also makes the converted files smaller because we no longer expand the
flags into comments.
Change-Id: I2ca02d7f1b95f250409e8b23c4ad7bb595d77fdf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39188
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Wycheproof have added many more tests. They'll be imported in subsequent
CLs.
Change-Id: I69d8e09328b08edbd0a96757db26b380d7a7c7ee
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39184
Reviewed-by: Adam Langley <agl@google.com>
The experiment which motivated CECPQ2b has concluded (although the
results haven't been published yet) and the SIKE code is causing some
issues for gRPC in gprc/grpc#20100. Also, this is code size that takes
up space in Android etc.
Change-Id: I43b0b8c420f236c0fe9b40bf2517d2fde98495d5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38384
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This path has always had one-too-few “..” elements since the file first
appeared, but everyone seems to have lived with it, presumably because
/include is in the search path and the compiler tries relative to that.
Change-Id: I30006209ad74d064ded5dd2cd34b1f14806dcffe
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38344
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
I should have noticed this previously, but the SIKE code was exporting
symbols called generic things like “params”. They're not dynamically
exported, but BoringSSL is often statically linked so better to ensure
that these things are prefixed to avoid the risk of collisions.
Change-Id: I3a942dbc8f4eab703d5f1d6898f67513fd7b578c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36745
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
We already have crypto/dh/params.c and some of our downstream consumers
cannot take two source files with the same name in the same build
target.
Change-Id: I324ace094c2215b443e98fc9ae69876ea1929efa
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36744
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
* CECPQ2b will use SIKE/p434 instead of SIKE/p503
* KEM uses SHA256 instead of HMAC-256
* implements new starting curve: y^2=x^3 + 6x^2 + x
* adds optimized implementation for aarch64
* adds optimized implementation for AMD64
which do not support MULX/ADOX/ADCX
* syncs the SIKE test code with the NIST Round 2
specification.
* removes references to field size from variables
names, tests and defines.
Change-Id: I5359c6c62ad342354c6d337f7ee525158586ec93
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36704
Reviewed-by: Adam Langley <agl@google.com>
Clang recognizes the (mask & a) | (~mask & b) pattern as a select. While
it often optimizes this into a cmov, it sometimes inserts branches
instead, particularly when it detects a string of cmovs with the same
condition.
In the long term, we need language-level support for expressing our
constraints. In the short term, introduce value barriers to prevent the
compiler from reasoning about our bit tricks. Thanks to Chandler Carruth
for suggesting this pattern. It should be reasonably robust, short of
value-based PGO or the compiler learning to reason about empty inline
assembly blocks.
Apply barriers to our various constant-time selects. We should invest
more in the valgrind-based tooling to figure out if there are other
instances.
Change-Id: Icc24ce36a61f7fec021a762c27197b9c5bd28c5d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36484
Reviewed-by: Chandler Carruth <chandlerc@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Point addition formulas for short Weierstrass curves are often
incomplete and do not work for P + P. EC implementations usually rely on
constant-time operations never hitting this case, or at least it being
rare[0].
However, the condition checks several values. Our C functions use && and
||, and the P-256 assembly also branches. This can leak intermediate
values via a small side channel. Thanks to David Schrammel and Samuel
Weiser for reporting this.
nistz256 base point multiplication (keygen, ECDSA signing) is unaffected
due to ecp_nistz256_point_add_affine lacking a doubling case. nistp224
and nistp256 base point multiplication, on some compilers, are saved by
quirks of the "mixed" path. The generic code's base point multiplication
and all methods' arbitrary point multiplication (ECDH; ephemeral keys
makes this less interesting) are affected.
Fix the branches in the nistz256 assembly, and use bit operations in C.
Note the C versions are all different because P-224 believes true is 1,
P-256 believes true is any non-zero value, and the generic code believes
true is 0xf...f. This should be double-checked when reviewing.
Aside: The nistz256 assembly also special-cases nontrivial P + (-P) in
arbitrary point multiplication. Fortunately, the formulas in util.c hold
there and I believe one can show P + (-P) is unreachable for all curves.
Still, it would be nice to omit the branch if we can verify the assembly
works anyway.
[0] 03da376ff7/crypto/ec/ecp_nistp521.c (L1259)
Change-Id: I8958624cd6b5272e5076c6c1605ab089e85f4cb7
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36465
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
KWP is Key Wrap with Padding, defined in RFC 5649 and SP 800-38F. Like
Key Wrap, it's a poor-man's AEAD and shouldn't be used. However, some
existing systems use it and we need to interoperate.
The interface of the added functions is a little unfortunate, but they
match the interfaces of the existing Key Wrap functions which, in turn,
match functions in OpenSSL. Hopefully this way, if OpenSSL ever add
support, we'll line up.
Change-Id: I3496c288f32230a891261586ca2e9c4ee8456c09
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36324
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
See I9c20b660ce4b58dc633588cfd5b2e97a40203ec3. Aside for p224-64.c, we'd
already split mul_public into a dedicated function, at which point it's
simpler to just have three functions.
This makes it clearer when we do and don't care about the doubling case
coming up and avoids a bunch of NULL checks.
Change-Id: I7c5dafa7f12f4f53937d912ba22c90cb5a786f04
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36225
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
These even trip UBSan because they break alignment requirements. The
crypto_word_t isn't doing anything here, so just read bytes.
Change-Id: Icb6dfce2c3a10f8252bbb0889cbeedcf1e8d8e62
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36066
Reviewed-by: Adam Langley <alangley@gmail.com>
Commit-Queue: David Benjamin <davidben@google.com>
Most of the crypto_word_t* casts in third_party/sike are due to p503
being defined with uint64_t. This is a strict aliasing violation and
easily avoided with a TOBN-like macro when defining p503.
This clears almost all of the casts. Also remove an unused stdbool.h
include.
Change-Id: Ife3a4ec620f8b7f4e09c87c6fc43d8b82396046b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36064
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <alangley@gmail.com>
Based on Microsoft's implementation available on github:
Source: https://github.com/Microsoft/PQCrypto-SIDH
Commit: 77044b76181eb61c744ac8eb7ddc7a8fe72f6919
Following changes has been applied
* In intel assembly, use MOV instead of MOVQ:
Intel instruction reference in the Intel Software Developer's Manual
volume 2A, the MOVQ has 4 forms. None of them mentions moving
literal to GPR, hence "movq $rax, 0x0" is wrong. Instead, on 64bit
system, MOV can be used.
* Some variables were wrongly zero-initialized (as per C99 spec).
* Rewrite x86_64 assembly to AT&T format.
* Move assembly for x86_64 and aarch64 to perlasm.
* Changes to aarch64 assembly, to avoid using x18 platform register.
Assembly also correctly constructs linked list of stack-frames as
described in AAPCS64, 5.2.3.
* Move constant values to .RODATA segment, as keeping them in .TEXT
segment is not compatible with XOM.
* Fixes issue in arm64 code related to the fact that compiler doesn't
reserve enough space for the linker to relocate address of a global
variable when used by 'ldr' instructions. Solution is to use 'adrp'
followed by 'add' instruction. Relocations for 'adrp' and 'add'
instructions is generated by prefixing the label with :pg_hi21:
and :lo12: respectively.
* Enable MULX and ADX. Code from MS doesn't support PIC. MULX can't
reference global variable directly. Instead RIP-relative addressing
can be used. This improves performance around 10%-13% on SkyLake
* Check if CPU supports BMI2 and ADOX instruction at runtime. On AMD64
optimized implementation of montgomery multiplication and reduction
have 2 implementations - faster one takes advantage of BMI2
instruction set introduced in Haswell and ADOX introduced in
Broadwell. Thanks to OPENSSL_ia32cap_P it can be decided at runtime
which implementation to choose. As CPU configuration is static by
nature, branch predictor will be correct most of the time and hence
this check very often has no cost.
* Reuse some utilities from boringssl instead of reimplementing them.
This includes things like:
* definition of a limb size (use crypto_word_t instead of digit_t)
* use functions for checking in constant time if value is 0 and/or
less then
* #define's used for conditional compilation
* Use SSE2 for conditional swap on vector registers. Improves
performance a little bit.
* Fix f2elm_t definition. Code imported from MSR defines f2elm_t type as
a array of arrays. This decays to a pointer to an array (when passing
as an argument). In C, one can't assign const pointer to an array with
non-const pointer to an array. Seems it violates 6.7.3/8 from C99
(same for C11). This problem occures in GCC 6, only when -pedantic
flag is specified and it occures always in GCC 4.9 (debian jessie).
* Fix definition of eval_3_isog. Second argument in eval_3_isog mustn't be
const. Similar reason as above.
* Use HMAC-SHA256 instead of cSHAKE-256 to avoid upstreaming cSHAKE
and SHA3 code.
* Add speed and unit tests for SIKE.
Some speed results:
Skylake (64-bit):
Did 408 SIKE/P503 generate operations in 1002573us (407.0 ops/sec)
Did 275 SIKE/P503 encap operations in 1070570us (256.9 ops/sec)
Did 264 SIKE/P503 decap operations in 1098955us (240.2 ops/sec)
Skylake (32-bit):
Did 9 SIKE/P503 generate operations in 1051620us (8.6 ops/sec)
Did 5 SIKE/P503 encap operations in 1038251us (4.8 ops/sec)
Did 5 SIKE/P503 decap operations in 1103617us (4.5 ops/sec)
Change-Id: I22f0bb1f9edff314a35cd74b48e8c4962568e330
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35204
Reviewed-by: Adam Langley <alangley@gmail.com>
The new version of googletest deprecates INSTANTIATE_TEST_CASE_P in
favor of INSTANTIATE_TEST_SUITE_P, so apply the change.
This requires blacklisting C4628 on MSVC 2015 which says about digraphs
given foo<::std::tuple<...>>. Disable that warning. Digraphs are not
useful and C++11 apparently explicitly disambiguates that.
It also requires applying
https://github.com/google/googletest/pull/2226, to deal with a warning
in older MSVC.
Update-Note: Consumers using BoringSSL with their own copy of googletest
must ensure googletest was updated to a version from 2019-01-03 or
later for INSTANTIATE_TEST_SUITE_P to work. (I believe all relevant
consumers are fine here. If anyone can't update googletest and is
building BoringSSL tests, building with
-DINSTANTIATE_TEST_SUITE_P=INSTANTIATE_TEST_CASE_P would work as
workaround.)
Bug: chromium:936651
Change-Id: I23ada8de34a53131cab88a36a88d3185ab085c64
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35504
Reviewed-by: Adam Langley <agl@google.com>
Unfortunately, it's not enough to be able to turn it on thanks to the
PURE_VIRTUAL macro. But it gets us most of the way there.
Change-Id: Ie6ad5119fcfd420115fa49d7312f3586890244f4
Reviewed-on: https://boringssl-review.googlesource.com/c/34949
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>