Remove `SmallerModulus` and instead do the check dynamically. This
eliminates the last `unsafe impl` regarding the modulus
relationships. The uses of `elem_widen` won't ever fail but since
they are in an already-fallible function they wo't hurt.
The dynamic checks should never fail but since they are added in
already-fallible functions they won't cause any trouble. This
facilitates future changes where the dynmic checks are required.
This saves two private-modulus-length multiplications per RSA
private key operation at the cost of two private-modulus-length
squarings per `RsaKeyPair` construction.
Split the checking of the private modulus from the checking of the
private exponent so that we can do things in the order recommended
in the NIST spec.
This also facilitates storing R**3 instead of R**2 in the
`RsaKeyPair`. (We need R**2 during `RsaKeyPair` construction, but
R**3 afterwards.)
This was necessary at some point in the past, but no longer is. It is
better to avoid depending on any of the `core::fmt` machinery in these
lower layers if we can avoid it.
`PublicModulus` and `PrivatePrime` are basically duplicates of
`OwnedModulusWithOne`. In the future we would like to create an
`OwnedModulus` that doesn't need 1RR to be calculated. Also in the
future we'd like to be able to "take" 1RR from a public modulus.
This change is a step towards those ends.
Use the pattern we typically use where one argument is passed by value.
This lets us use `limbs_add_assign_mod`, eliminating the `unsafe`
direct use of `LIMBS_add_mod`. This will make future refactoring easier.
This also eliminates the need to construct and zeroize a new scalar `r`
for the result.
Note: I originally tried an alternative implementation using `flat_map` that
ended up being materially slower. To fix that performance regression I had to
make the following change:
```
let mut output = Output([0; MAX_OUTPUT_LEN]);
output
.0
- .iter_mut()
- .zip(input.iter().copied().flat_map(|Wrapping(w)| f(w)))
+ .chunks_mut(N)
+ .zip(input.iter().copied().map(|Wrapping(w)| f(w)))
.for_each(|(o, i)| {
- *o = i;
+ o.copy_from_slice(&i);
});
output
}
```
I verified that this generates the same assembly code as the original code
on x86-64 using Rust 1.74.0, except that there are two additional 128-bit
moves in `sha256_formta_output` to zero out the latter half of `Output`,
which was intended.
Clarify how the math works, and use a slightly better trade-off of
doubling vs squaring. On 64-bit targets RSA verification is now
less than 10% faster. On 32-bit targets its over 20% faster. I
expect that we can improve the performance further by optimizing
the doubling implementation.
Also the new implementation avoids allocating/cloning any temporary
`Elem`s, unlike the previous implementation.
Save two private-modulus Montgomery multiplications per RSA exponentiation
at the cost of approximately two modulus-wide XORs.
The new new `oneR()` is extracted from the Montgomery RR setup.
Remove the use of `One<RR>` in `elem_exp_consttime`.
Eliminate one modular doubling in Montgomery RR setup. This saves one
public modulus modular doubling per RSA signature verification, at the
cost of approximately one public-modulus-wide XOR. RsaKeyPair also sees
similar savings per Modulus.
This Cargo feature treats a user-provided `getrandom` implementation as
a secure random number generator (`SecureRandom`). The feature only has
effect on targets not supported by `getrandom`.
I agree to license my contributions to each file under the terms given
at the top of each file I changed.
Values for P-521 have an odd number of limbs in 32-bit mode, which
means we can't keep using `TOBN`, and also Montgomery-encoded
values are different for 32-bit and 64-bit.
Generate some of the C boilerplate, particularly the large constants.
The output is written into target/curves/, and can be merged into
the actual code in crypto/fipsmodule/ec/ using a two-way merge tool;
this is the same as the Rust code generation.
Changes to gfp_p{256,384}.c are due to differences in the generator's
output:
* The generator doesn't generate trailing commas in arrays.
* The generator consistently avoids adding leading zeros to hex
constants, and consistently format values less than 10 in decimal;
the exiting code used a mix of styles.
* The generator wraps arrays consistently; the existing code used a
mix of wrapping styles.
* The generator does not nest constants in the functions that need
them. This was changed to support future refactorings.
Previously we swapped p and q and calcualted a new qInv if p < q so
that we could avoid doing a redunction during the CRT computation.
Instead, just do the reduction during CRT as it's cheap. This
notably reduces the number of operations we need in `bigint`, and
it eliminates the need for the `Prime` modulus marker type.
Now there are more things that can go wrong during CRT. First, we
may wrongly forget to reduce m_2 mod p; before this wasn't necessary
since every element of q was an element of p. Next, we may wrongly
use the the value of m_2 mod p instead of m_2 later; before we could
do this since previously m_2 mod p == m_2 since m_2 < q < p. Add
tests for these cases.
Rewrite the tests for `elem_reduced_once` given its new constraints.
QQ comprised almost 25% of the bulk of RsaKeyPair and is actually
completely unnecessary since `elem_reduced` can do the whole
reduction itself.
This has the nice and important side effect of eliminating some
conversion operations between `bigint` types.
This is also a step towards eliminating some of the `unsafe trait`
stuff that kinda-but-not-really modeled modulus relationships.
Move all the checks that are done for each private prime into
the `PrivatePrime` constructor, to eliminate duplication.
This causes the 512-bit-ness check to be done earlier than before,
which affects some of the tests..
Originally we only had `Modulus`. Then we had a need for a
temporary `Modulus` without `oneRR` so we created `PartialModulus`.
However, there is really nothing "partial" about them. So, improve
the naming by renaming `PartialModulus` to `Modulus` and `Modulus`
to `OwnedModulusWithOne`. In the future we may refactor things
further to separate the ownership aspect from the "has oneRR"
aspect.
Instead of just doing a straightforward rename, take this
opportunity to refactor the code so that it uses the new `Modulus`
whenever `oneRR()` isn't used. This eliminates the duplication of
the APIs of the two modulus types, and the duplication of
`elem_mul` and `elem_mul_`.
The original idea of `Width` was that we'd support operatings that
worked on multiple same-width but different-modulus values, and/or
we'd support splitting a 2N-limb `BoxedLimb` into two N-limb
`&[Limb]`, etc. However, as things are now, `Width` doesn't really
serve a useful purpose.
When we added `rsa::PublicKey` we changed the `ring::signature` RSA
implementation to construct an `rsa::PublicKey` and then verify the
signature using it. Unfortunately for backward compatibility with old
uses of `RsaKeyPair`, `rsa::PublicKey` constructor constructs (and
allocates) a copy of the ASN.1-serialized public key. This is not
acceptable for users who are using `ring::signature` to verify a
single signature. Refactor `PublicKey` so that it can be bypassed
by the `ring::signature` implementation.
This is a step towards implementing allocation-free RSA signature
verification.
Align with the other use of `OPENSSL_memcpy` in `curve25519_64_adx.h`.
`string.h` will no longer be needed.
Signed-off-by: Jiaqi Gao <jiaqi.gao@intel.com>
For now, just put `#[allow(...)]` directives in the places where the
conversions are done. We'll follow up in the future with the correct
replacement for `as` for each case, as several PRs.
Do it because BoringSSL does it. BoringSSL has some other headers it
includes here but we intentionally do not have them and/or we
intentionally do not include them here (string.h and assert.h).