Remove p > q normalization in RSA keys

RSA CRT is tiny bit messier when p < q.
https://boringssl-review.googlesource.com/25263 solved this by
normalizing to p > q. The cost was we sometimes had to compute a new
iqmp.

Modular inversion is expensive. We did it only once per key, but it's
still a performance cliff in per-key costs. When later work moves
freeze_private_key into RSA private key parsing, it will be a
performance cliff in the private key parser.

Instead, just handle p < q in the CRT function. The only difference is
needing one extra reduction before the modular subtraction. Even using
the fully general mod_montgomery function (as opposed to checking p < q,
or using bn_reduce_once when num_bits(p) == num_bits(q)) was not
measurable.

In doing so, I noticed we didn't actually have tests that exercise the
reduction step. I added one to evp_tests.txt, but it is only meaningful
when blinding is disabled. (Another cost of blinding.) When blinding is
enabled, the answers mod p and q are randomized and we hit this case
with about 1.8% probability. See comment in evp_test.txt.

I kept the optimization where we store iqmp in Montgomery form, not
because the optimization matters, but because we need to store a
corrected, fixed-width version of the value anyway, so we may as well
store it in a more convenient form.

M1 Max
Before:
Did 9048 RSA 2048 signing operations in 5033403us (1797.6 ops/sec)
Did 1500 RSA 4096 signing operations in 5009288us (299.4 ops/sec)
After:
Did 9116 RSA 2048 signing operations in 5053802us (1803.8 ops/sec) [+0.3%]
Did 1500 RSA 4096 signing operations in 5008283us (299.5 ops/sec) [+0.0%]

Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
Before:
Did 9282 RSA 2048 signing operations in 5019395us (1849.2 ops/sec)
Did 1302 RSA 4096 signing operations in 5055011us (257.6 ops/sec)
After:
Did 9240 RSA 2048 signing operations in 5024845us (1838.9 ops/sec) [-0.6%]
Did 1302 RSA 4096 signing operations in 5046157us (258.0 ops/sec) [+0.2%]

Bug: 316
Change-Id: Icb90c7d5f5188f9b69a6d7bcc63db13d92ec26d5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60705
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This commit is contained in:
David Benjamin 2023-06-11 16:05:05 -04:00 committed by Boringssl LUCI CQ
parent 02d2715bcc
commit 6a7d8b5472
5 changed files with 51 additions and 58 deletions

View File

@ -125,7 +125,7 @@ static int GetKeyType(FileTest *t, const std::string &name) {
return EVP_PKEY_NONE;
}
static int GetRSAPadding(FileTest *t, int *out, const std::string &name) {
static bool GetRSAPadding(FileTest *t, int *out, const std::string &name) {
if (name == "PKCS1") {
*out = RSA_PKCS1_PADDING;
return true;
@ -138,6 +138,10 @@ static int GetRSAPadding(FileTest *t, int *out, const std::string &name) {
*out = RSA_PKCS1_OAEP_PADDING;
return true;
}
if (name == "None") {
*out = RSA_NO_PADDING;
return true;
}
ADD_FAILURE() << "Unknown RSA padding mode: " << name;
return false;
}

View File

@ -655,6 +655,16 @@ Digest = SHA256
Input = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
Output = 07fa4e3de9c002c41c952dc292ef5a814c4c17dc1a6cf958c4c971e8089676d6661b442270ef9295c41e5385c9628aa1bdee2cc2558b8473ba212f2ba04b9ff2264c19187b9506b1d0a1cc2751844cc8dedf555d62ce81bc0e70bfe83d0184ee964593af91b9b327c0fb272c799148cd8737d412cbf36c2ad25fd66977bf805f
# The input is 1 (mod p) and q - 1 (mod q). If the CRT implementation does not
# account for p < q, the subtraction step will break. However, this test only
# works when RSA blinding is disabled. When blinding is enabled, these values
# are randomized and, instead, either this or the above test will hit this case
# if repeated enough. (It hits it with probability (q-p)^2 / 2pq, or 1.8% here.)
Encrypt = RSA-Swapped
RSAPadding = None
Input = 7ab490e1f5e718ff23a9e738f9867559e3a6ea72332e3bcc1b6f58585218ed815a865dab75e3f44ea550bdef815101d6039251a513fd99188c1916abb94b15ef72de793f6472a342b33a125cfc96a4fc89d140e337f5fe6ff937ed3f34b6b09f5227f598e12faddf4423254acbee748197bed26954502c7484c65e279fed7fed
CheckDecrypt
# Though we will never generate such a key, test that RSA keys where p and q are
# different sizes work properly.
PrivateKey = RSA-PrimeMismatch

View File

@ -103,10 +103,8 @@ struct rsa_st {
// the |freeze_private_key| implementation.
BIGNUM *d_fixed, *dmp1_fixed, *dmq1_fixed;
// inv_small_mod_large_mont is q^-1 mod p in Montgomery form, using |mont_p|,
// if |p| >= |q|. Otherwise, it is p^-1 mod q in Montgomery form, using
// |mont_q|.
BIGNUM *inv_small_mod_large_mont;
// iqmp_mont is q^-1 mod p in Montgomery form, using |mont_p|.
BIGNUM *iqmp_mont;
// num_blindings contains the size of the |blindings| and |blindings_inuse|
// arrays. This member and the |blindings_inuse| array are protected by

View File

@ -243,37 +243,24 @@ static int freeze_private_key(RSA *rsa, BN_CTX *ctx) {
}
// CRT components are only publicly bounded by their corresponding
// moduli's bit lengths. |rsa->iqmp| is unused outside of this one-time
// setup, so we do not compute a fixed-width version of it.
// moduli's bit lengths.
if (!ensure_fixed_copy(&rsa->dmp1_fixed, rsa->dmp1, p_fixed->width) ||
!ensure_fixed_copy(&rsa->dmq1_fixed, rsa->dmq1, q_fixed->width)) {
goto err;
}
// Compute |inv_small_mod_large_mont|. Note that it is always modulo the
// larger prime, independent of what is stored in |rsa->iqmp|.
if (rsa->inv_small_mod_large_mont == NULL) {
BIGNUM *inv_small_mod_large_mont = BN_new();
int ok;
if (BN_cmp(rsa->p, rsa->q) < 0) {
ok = inv_small_mod_large_mont != NULL &&
bn_mod_inverse_secret_prime(inv_small_mod_large_mont, rsa->p,
rsa->q, ctx, rsa->mont_q) &&
BN_to_montgomery(inv_small_mod_large_mont,
inv_small_mod_large_mont, rsa->mont_q, ctx);
} else {
ok = inv_small_mod_large_mont != NULL &&
BN_to_montgomery(inv_small_mod_large_mont, rsa->iqmp,
rsa->mont_p, ctx);
}
if (!ok) {
BN_free(inv_small_mod_large_mont);
// Compute |iqmp_mont|, which is |iqmp| in Montgomery form and with the
// correct bit width.
if (rsa->iqmp_mont == NULL) {
BIGNUM *iqmp_mont = BN_new();
if (iqmp_mont == NULL ||
!BN_to_montgomery(iqmp_mont, rsa->iqmp, rsa->mont_p, ctx)) {
BN_free(iqmp_mont);
goto err;
}
rsa->inv_small_mod_large_mont = inv_small_mod_large_mont;
CONSTTIME_SECRET(
rsa->inv_small_mod_large_mont->d,
sizeof(BN_ULONG) * rsa->inv_small_mod_large_mont->width);
rsa->iqmp_mont = iqmp_mont;
CONSTTIME_SECRET(rsa->iqmp_mont->d,
sizeof(BN_ULONG) * rsa->iqmp_mont->width);
}
}
}
@ -302,8 +289,8 @@ void rsa_invalidate_key(RSA *rsa) {
rsa->dmp1_fixed = NULL;
BN_free(rsa->dmq1_fixed);
rsa->dmq1_fixed = NULL;
BN_free(rsa->inv_small_mod_large_mont);
rsa->inv_small_mod_large_mont = NULL;
BN_free(rsa->iqmp_mont);
rsa->iqmp_mont = NULL;
for (size_t i = 0; i < rsa->num_blindings; i++) {
BN_BLINDING_free(rsa->blindings[i]);
@ -798,42 +785,37 @@ static int mod_exp(BIGNUM *r0, const BIGNUM *I, RSA *rsa, BN_CTX *ctx) {
goto err;
}
// Implementing RSA with CRT in constant-time is sensitive to which prime is
// larger. Canonicalize fields so that |p| is the larger prime.
const BIGNUM *dmp1 = rsa->dmp1_fixed, *dmq1 = rsa->dmq1_fixed;
const BN_MONT_CTX *mont_p = rsa->mont_p, *mont_q = rsa->mont_q;
if (BN_cmp(rsa->p, rsa->q) < 0) {
mont_p = rsa->mont_q;
mont_q = rsa->mont_p;
dmp1 = rsa->dmq1_fixed;
dmq1 = rsa->dmp1_fixed;
}
// Use the minimal-width versions of |n|, |p|, and |q|. Either works, but if
// someone gives us non-minimal values, these will be slightly more efficient
// on the non-Montgomery operations.
const BIGNUM *n = &rsa->mont_n->N;
const BIGNUM *p = &mont_p->N;
const BIGNUM *q = &mont_q->N;
const BIGNUM *p = &rsa->mont_p->N;
const BIGNUM *q = &rsa->mont_q->N;
// This is a pre-condition for |mod_montgomery|. It was already checked by the
// caller.
assert(BN_ucmp(I, n) < 0);
if (// |m1| is the result modulo |q|.
!mod_montgomery(r1, I, q, mont_q, p, ctx) ||
!BN_mod_exp_mont_consttime(m1, r1, dmq1, q, ctx, mont_q) ||
!mod_montgomery(r1, I, q, rsa->mont_q, p, ctx) ||
!BN_mod_exp_mont_consttime(m1, r1, rsa->dmq1_fixed, q, ctx,
rsa->mont_q) ||
// |r0| is the result modulo |p|.
!mod_montgomery(r1, I, p, mont_p, q, ctx) ||
!BN_mod_exp_mont_consttime(r0, r1, dmp1, p, ctx, mont_p) ||
// Compute r0 = r0 - m1 mod p. |p| is the larger prime, so |m1| is already
// fully reduced mod |p|.
!bn_mod_sub_consttime(r0, r0, m1, p, ctx) ||
!mod_montgomery(r1, I, p, rsa->mont_p, q, ctx) ||
!BN_mod_exp_mont_consttime(r0, r1, rsa->dmp1_fixed, p, ctx,
rsa->mont_p) ||
// Compute r0 = r0 - m1 mod p. |m1| is reduced mod |q|, not |p|, so we
// just run |mod_montgomery| again for simplicity. This could be more
// efficient with more cases: if |p > q|, |m1| is already reduced. If
// |p < q| but they have the same bit width, |bn_reduce_once| suffices.
// However, compared to over 2048 Montgomery multiplications above, this
// difference is not measurable.
!mod_montgomery(r1, m1, p, rsa->mont_p, q, ctx) ||
!bn_mod_sub_consttime(r0, r0, r1, p, ctx) ||
// r0 = r0 * iqmp mod p. We use Montgomery multiplication to compute this
// in constant time. |inv_small_mod_large_mont| is in Montgomery form and
// r0 is not, so the result is taken out of Montgomery form.
!BN_mod_mul_montgomery(r0, r0, rsa->inv_small_mod_large_mont, mont_p,
ctx) ||
// in constant time. |iqmp_mont| is in Montgomery form and r0 is not, so
// the result is taken out of Montgomery form.
!BN_mod_mul_montgomery(r0, r0, rsa->iqmp_mont, rsa->mont_p, ctx) ||
// r0 = r0 * q + m1 gives the final result. Reducing modulo q gives m1, so
// it is correct mod p. Reducing modulo p gives (r0-m1)*iqmp*q + m1 = r0,
// so it is correct mod q. Finally, the result is bounded by [m1, n + m1),
@ -1308,8 +1290,7 @@ static int RSA_generate_key_ex_maybe_fips(RSA *rsa, int bits,
replace_bignum(&rsa->d_fixed, &tmp->d_fixed);
replace_bignum(&rsa->dmp1_fixed, &tmp->dmp1_fixed);
replace_bignum(&rsa->dmq1_fixed, &tmp->dmq1_fixed);
replace_bignum(&rsa->inv_small_mod_large_mont,
&tmp->inv_small_mod_large_mont);
replace_bignum(&rsa->iqmp_mont, &tmp->iqmp_mont);
rsa->private_key_frozen = tmp->private_key_frozen;
ret = 1;

View File

@ -983,7 +983,7 @@ TEST(RSATest, KeygenFail) {
EXPECT_FALSE(rsa->d_fixed);
EXPECT_FALSE(rsa->dmp1_fixed);
EXPECT_FALSE(rsa->dmq1_fixed);
EXPECT_FALSE(rsa->inv_small_mod_large_mont);
EXPECT_FALSE(rsa->iqmp_mont);
EXPECT_FALSE(rsa->private_key_frozen);
// Failed key generations leave the previous contents alone.