Remove `SmallerModulus` and instead do the check dynamically. This
eliminates the last `unsafe impl` regarding the modulus
relationships. The uses of `elem_widen` won't ever fail but since
they are in an already-fallible function they wo't hurt.
The dynamic checks should never fail but since they are added in
already-fallible functions they won't cause any trouble. This
facilitates future changes where the dynmic checks are required.
This saves two private-modulus-length multiplications per RSA
private key operation at the cost of two private-modulus-length
squarings per `RsaKeyPair` construction.
Split the checking of the private modulus from the checking of the
private exponent so that we can do things in the order recommended
in the NIST spec.
This also facilitates storing R**3 instead of R**2 in the
`RsaKeyPair`. (We need R**2 during `RsaKeyPair` construction, but
R**3 afterwards.)
This was necessary at some point in the past, but no longer is. It is
better to avoid depending on any of the `core::fmt` machinery in these
lower layers if we can avoid it.
`PublicModulus` and `PrivatePrime` are basically duplicates of
`OwnedModulusWithOne`. In the future we would like to create an
`OwnedModulus` that doesn't need 1RR to be calculated. Also in the
future we'd like to be able to "take" 1RR from a public modulus.
This change is a step towards those ends.
Use the pattern we typically use where one argument is passed by value.
This lets us use `limbs_add_assign_mod`, eliminating the `unsafe`
direct use of `LIMBS_add_mod`. This will make future refactoring easier.
This also eliminates the need to construct and zeroize a new scalar `r`
for the result.
Note: I originally tried an alternative implementation using `flat_map` that
ended up being materially slower. To fix that performance regression I had to
make the following change:
```
let mut output = Output([0; MAX_OUTPUT_LEN]);
output
.0
- .iter_mut()
- .zip(input.iter().copied().flat_map(|Wrapping(w)| f(w)))
+ .chunks_mut(N)
+ .zip(input.iter().copied().map(|Wrapping(w)| f(w)))
.for_each(|(o, i)| {
- *o = i;
+ o.copy_from_slice(&i);
});
output
}
```
I verified that this generates the same assembly code as the original code
on x86-64 using Rust 1.74.0, except that there are two additional 128-bit
moves in `sha256_formta_output` to zero out the latter half of `Output`,
which was intended.
Clarify how the math works, and use a slightly better trade-off of
doubling vs squaring. On 64-bit targets RSA verification is now
less than 10% faster. On 32-bit targets its over 20% faster. I
expect that we can improve the performance further by optimizing
the doubling implementation.
Also the new implementation avoids allocating/cloning any temporary
`Elem`s, unlike the previous implementation.
Save two private-modulus Montgomery multiplications per RSA exponentiation
at the cost of approximately two modulus-wide XORs.
The new new `oneR()` is extracted from the Montgomery RR setup.
Remove the use of `One<RR>` in `elem_exp_consttime`.
Eliminate one modular doubling in Montgomery RR setup. This saves one
public modulus modular doubling per RSA signature verification, at the
cost of approximately one public-modulus-wide XOR. RsaKeyPair also sees
similar savings per Modulus.
This Cargo feature treats a user-provided `getrandom` implementation as
a secure random number generator (`SecureRandom`). The feature only has
effect on targets not supported by `getrandom`.
I agree to license my contributions to each file under the terms given
at the top of each file I changed.
Previously we swapped p and q and calcualted a new qInv if p < q so
that we could avoid doing a redunction during the CRT computation.
Instead, just do the reduction during CRT as it's cheap. This
notably reduces the number of operations we need in `bigint`, and
it eliminates the need for the `Prime` modulus marker type.
Now there are more things that can go wrong during CRT. First, we
may wrongly forget to reduce m_2 mod p; before this wasn't necessary
since every element of q was an element of p. Next, we may wrongly
use the the value of m_2 mod p instead of m_2 later; before we could
do this since previously m_2 mod p == m_2 since m_2 < q < p. Add
tests for these cases.
Rewrite the tests for `elem_reduced_once` given its new constraints.
QQ comprised almost 25% of the bulk of RsaKeyPair and is actually
completely unnecessary since `elem_reduced` can do the whole
reduction itself.
This has the nice and important side effect of eliminating some
conversion operations between `bigint` types.
This is also a step towards eliminating some of the `unsafe trait`
stuff that kinda-but-not-really modeled modulus relationships.
Move all the checks that are done for each private prime into
the `PrivatePrime` constructor, to eliminate duplication.
This causes the 512-bit-ness check to be done earlier than before,
which affects some of the tests..
Originally we only had `Modulus`. Then we had a need for a
temporary `Modulus` without `oneRR` so we created `PartialModulus`.
However, there is really nothing "partial" about them. So, improve
the naming by renaming `PartialModulus` to `Modulus` and `Modulus`
to `OwnedModulusWithOne`. In the future we may refactor things
further to separate the ownership aspect from the "has oneRR"
aspect.
Instead of just doing a straightforward rename, take this
opportunity to refactor the code so that it uses the new `Modulus`
whenever `oneRR()` isn't used. This eliminates the duplication of
the APIs of the two modulus types, and the duplication of
`elem_mul` and `elem_mul_`.
The original idea of `Width` was that we'd support operatings that
worked on multiple same-width but different-modulus values, and/or
we'd support splitting a 2N-limb `BoxedLimb` into two N-limb
`&[Limb]`, etc. However, as things are now, `Width` doesn't really
serve a useful purpose.
When we added `rsa::PublicKey` we changed the `ring::signature` RSA
implementation to construct an `rsa::PublicKey` and then verify the
signature using it. Unfortunately for backward compatibility with old
uses of `RsaKeyPair`, `rsa::PublicKey` constructor constructs (and
allocates) a copy of the ASN.1-serialized public key. This is not
acceptable for users who are using `ring::signature` to verify a
single signature. Refactor `PublicKey` so that it can be bypassed
by the `ring::signature` implementation.
This is a step towards implementing allocation-free RSA signature
verification.
For now, just put `#[allow(...)]` directives in the places where the
conversions are done. We'll follow up in the future with the correct
replacement for `as` for each case, as several PRs.
When I generated these test vectors, I gave all of them the same point: the
generator of the curve. Consequently these input files are 100% redundant
with the `point_mul_base.txt` input files. So just remove them and use the
`point_mul_base.txt` files instead.
Avoid using the P384_POINT type on the C side. It seems to work for all
the targets we support, for P-384, but this pattern probably doesn't
work in general. Especially due to alignment issues for 32-bit targets,
it is doubtful it would work for P-521.
When `elem_exp_consttime` replaced `BN_mod_exp_mont_consttime` I did
not fully understand the way the table was constructed in the original
function. Recent BoringSSL changes clarify the table construction. Do
it the same way, to restore performance to what it was previously.
This addresses the `// TODO: Optimize this to avoid gathering`.
When this code was written, it wasn't clear which assembly language
functions took a pointer to the entire state vs. just a pointer to
the accumulator (etc.). Now upstream clarified things and we can
clarify this code.