Make bn_mod_lshift_consttime faster

bn_mod_lshift_consttime currently calls bn_mod_lshift1_consttime in a
loop, but between needing a temporary value and having to guard against
some complications in our fixed-width BIGNUM convention, it's actually
picking up a lot of overhead.

This function is currently called to setup Montgomery contexts with
secret moduli (RSA primes). The setup operation is not
performance-sensitive in our benchmarks, because it is amortized away in
RSA private key signing. However, as part of reducing thread contention
with the RSA object, I'm planning to make RSA creation, which we do
benchmark, eagerly fill in the Montgomery context.

We do benchmark RSA parsing, so adding a slow Montgomery setup would
show up in benchmarks. This distinction is mostly artificial. Work done
on creation and work done on first use is still work done once per RSA
key. However, work done on key creation may slow server startup, while
work deferred to first use is amortized but less predictable.

Either way, from this CL, and especially the one to follow it, we have
plenty of low-hanging fruit in this function. As a bonus, this should
help single-use RSA private keys, but that's not something we currently
benchmark.

Modulus sizes below chosen based on:

- Common curve sizes (moot because we use a variable-time setup anyway)

- Common RSA modulus sizes (also variable-time setup)

- Half of common RSA modulus sizes (the secret primes involved)

Of these, only the third category matters. The others can use the
division-based path where it's faster anyway. However, by the end of
this patch series, they'll get a bit closer, so I benchmarked them all
to compare. (Though division still wins in the end.)

Benchmarks on an M1 Max:

Before:
Did 528000 256-bit mont (constime) operations in 2000993us (263869.0 ops/sec)
Did 312000 384-bit mont (constime) operations in 2001281us (155900.1 ops/sec)
Did 246000 512-bit mont (constime) operations in 2001521us (122906.5 ops/sec)
Did 191000 521-bit mont (constime) operations in 2006336us (95198.4 ops/sec)
Did 98000 1024-bit mont (constime) operations in 2001438us (48964.8 ops/sec)
Did 55000 1536-bit mont (constime) operations in 2025306us (27156.4 ops/sec)
Did 35000 2048-bit mont (constime) operations in 2022714us (17303.5 ops/sec)
Did 17640 3072-bit mont (constime) operations in 2028352us (8696.7 ops/sec)
Did 10290 4096-bit mont (constime) operations in 2065529us (4981.8 ops/sec)

After:
Did 712000 256-bit mont (constime) operations in 2000454us (355919.2 ops/sec) [+34.9%]
Did 440000 384-bit mont (constime) operations in 2001121us (219876.8 ops/sec) [+41.0%]
Did 259000 512-bit mont (constime) operations in 2003709us (129260.3 ops/sec) [+5.2%]
Did 212000 521-bit mont (constime) operations in 2007033us (105628.6 ops/sec) [+11.0%]
Did 107000 1024-bit mont (constime) operations in 2018551us (53008.3 ops/sec) [+8.3%]
Did 57000 1536-bit mont (constime) operations in 2001027us (28485.4 ops/sec) [+4.9%]
Did 37000 2048-bit mont (constime) operations in 2039631us (18140.5 ops/sec) [+4.8%]
Did 20000 3072-bit mont (constime) operations in 2041163us (9798.3 ops/sec) [+12.7%]
Did 11760 4096-bit mont (constime) operations in 2007195us (5858.9 ops/sec) [+17.6%]

Bug: 316
Change-Id: I06f4a065fdecc1aec3160fe32a41e200538d1ee3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60685
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
This commit is contained in:
David Benjamin 2023-06-10 00:29:28 -04:00 committed by Boringssl LUCI CQ
parent acfb1062f4
commit 98e1227cb7

View File

@ -711,15 +711,22 @@ int BN_mod_lshift(BIGNUM *r, const BIGNUM *a, int n, const BIGNUM *m,
int bn_mod_lshift_consttime(BIGNUM *r, const BIGNUM *a, int n, const BIGNUM *m,
BN_CTX *ctx) {
if (!BN_copy(r, a)) {
if (!BN_copy(r, a) ||
!bn_resize_words(r, m->width)) {
return 0;
}
for (int i = 0; i < n; i++) {
if (!bn_mod_lshift1_consttime(r, r, m, ctx)) {
return 0;
BN_CTX_start(ctx);
BIGNUM *tmp = bn_scratch_space_from_ctx(m->width, ctx);
int ok = tmp != NULL;
if (ok) {
for (int i = 0; i < n; i++) {
bn_mod_add_words(r->d, r->d, r->d, m->d, tmp->d, m->width);
}
r->neg = 0;
}
return 1;
BN_CTX_end(ctx);
return ok;
}
int BN_mod_lshift_quick(BIGNUM *r, const BIGNUM *a, int n, const BIGNUM *m) {