Commit f932b941bd1f59782cb3db8f7cd7b8b2c9842ee9 was incomplete and
wrong. On targets where we do any static or dynamic feature detection
and where we have these global variables, we need to unconditionally
write the detected features to the global variable so that assembly
can see them. Since we do static feature detection regardless of
operating system, the initialization of the global most be done
without any conditions on the operating system.
GitHub Actions runners already have rustup with the stable toolchain
installed, apparently. actions-rs is going away and we don't want to
keep maintaining a fork with an unsupported upstream, so start the
process of dropping it.
We want all of our internal symbols to be internal so that none of
these internal symbols leak from a static/dynamic library that is
built with *ring* inside.
Check `d` by processing it as a `OwnedModulus` like we do for the
other moduli. This should make the checking more consistent.
As a nice side effect, this eliminates the last non-test usage of
`Nonnegative` and elimnates more now-dead `Nonnegative` code.
Remove `SmallerModulus` and instead do the check dynamically. This
eliminates the last `unsafe impl` regarding the modulus
relationships. The uses of `elem_widen` won't ever fail but since
they are in an already-fallible function they wo't hurt.
The dynamic checks should never fail but since they are added in
already-fallible functions they won't cause any trouble. This
facilitates future changes where the dynmic checks are required.
This saves two private-modulus-length multiplications per RSA
private key operation at the cost of two private-modulus-length
squarings per `RsaKeyPair` construction.
Split the checking of the private modulus from the checking of the
private exponent so that we can do things in the order recommended
in the NIST spec.
This also facilitates storing R**3 instead of R**2 in the
`RsaKeyPair`. (We need R**2 during `RsaKeyPair` construction, but
R**3 afterwards.)
This was necessary at some point in the past, but no longer is. It is
better to avoid depending on any of the `core::fmt` machinery in these
lower layers if we can avoid it.
`PublicModulus` and `PrivatePrime` are basically duplicates of
`OwnedModulusWithOne`. In the future we would like to create an
`OwnedModulus` that doesn't need 1RR to be calculated. Also in the
future we'd like to be able to "take" 1RR from a public modulus.
This change is a step towards those ends.
Use the pattern we typically use where one argument is passed by value.
This lets us use `limbs_add_assign_mod`, eliminating the `unsafe`
direct use of `LIMBS_add_mod`. This will make future refactoring easier.
This also eliminates the need to construct and zeroize a new scalar `r`
for the result.
Note: I originally tried an alternative implementation using `flat_map` that
ended up being materially slower. To fix that performance regression I had to
make the following change:
```
let mut output = Output([0; MAX_OUTPUT_LEN]);
output
.0
- .iter_mut()
- .zip(input.iter().copied().flat_map(|Wrapping(w)| f(w)))
+ .chunks_mut(N)
+ .zip(input.iter().copied().map(|Wrapping(w)| f(w)))
.for_each(|(o, i)| {
- *o = i;
+ o.copy_from_slice(&i);
});
output
}
```
I verified that this generates the same assembly code as the original code
on x86-64 using Rust 1.74.0, except that there are two additional 128-bit
moves in `sha256_formta_output` to zero out the latter half of `Output`,
which was intended.
Clarify how the math works, and use a slightly better trade-off of
doubling vs squaring. On 64-bit targets RSA verification is now
less than 10% faster. On 32-bit targets its over 20% faster. I
expect that we can improve the performance further by optimizing
the doubling implementation.
Also the new implementation avoids allocating/cloning any temporary
`Elem`s, unlike the previous implementation.
Save two private-modulus Montgomery multiplications per RSA exponentiation
at the cost of approximately two modulus-wide XORs.
The new new `oneR()` is extracted from the Montgomery RR setup.
Remove the use of `One<RR>` in `elem_exp_consttime`.
Eliminate one modular doubling in Montgomery RR setup. This saves one
public modulus modular doubling per RSA signature verification, at the
cost of approximately one public-modulus-wide XOR. RsaKeyPair also sees
similar savings per Modulus.
This Cargo feature treats a user-provided `getrandom` implementation as
a secure random number generator (`SecureRandom`). The feature only has
effect on targets not supported by `getrandom`.
I agree to license my contributions to each file under the terms given
at the top of each file I changed.
Values for P-521 have an odd number of limbs in 32-bit mode, which
means we can't keep using `TOBN`, and also Montgomery-encoded
values are different for 32-bit and 64-bit.
Generate some of the C boilerplate, particularly the large constants.
The output is written into target/curves/, and can be merged into
the actual code in crypto/fipsmodule/ec/ using a two-way merge tool;
this is the same as the Rust code generation.
Changes to gfp_p{256,384}.c are due to differences in the generator's
output:
* The generator doesn't generate trailing commas in arrays.
* The generator consistently avoids adding leading zeros to hex
constants, and consistently format values less than 10 in decimal;
the exiting code used a mix of styles.
* The generator wraps arrays consistently; the existing code used a
mix of wrapping styles.
* The generator does not nest constants in the functions that need
them. This was changed to support future refactorings.
Previously we swapped p and q and calcualted a new qInv if p < q so
that we could avoid doing a redunction during the CRT computation.
Instead, just do the reduction during CRT as it's cheap. This
notably reduces the number of operations we need in `bigint`, and
it eliminates the need for the `Prime` modulus marker type.
Now there are more things that can go wrong during CRT. First, we
may wrongly forget to reduce m_2 mod p; before this wasn't necessary
since every element of q was an element of p. Next, we may wrongly
use the the value of m_2 mod p instead of m_2 later; before we could
do this since previously m_2 mod p == m_2 since m_2 < q < p. Add
tests for these cases.
Rewrite the tests for `elem_reduced_once` given its new constraints.
QQ comprised almost 25% of the bulk of RsaKeyPair and is actually
completely unnecessary since `elem_reduced` can do the whole
reduction itself.
This has the nice and important side effect of eliminating some
conversion operations between `bigint` types.
This is also a step towards eliminating some of the `unsafe trait`
stuff that kinda-but-not-really modeled modulus relationships.
Move all the checks that are done for each private prime into
the `PrivatePrime` constructor, to eliminate duplication.
This causes the 512-bit-ness check to be done earlier than before,
which affects some of the tests..
Originally we only had `Modulus`. Then we had a need for a
temporary `Modulus` without `oneRR` so we created `PartialModulus`.
However, there is really nothing "partial" about them. So, improve
the naming by renaming `PartialModulus` to `Modulus` and `Modulus`
to `OwnedModulusWithOne`. In the future we may refactor things
further to separate the ownership aspect from the "has oneRR"
aspect.
Instead of just doing a straightforward rename, take this
opportunity to refactor the code so that it uses the new `Modulus`
whenever `oneRR()` isn't used. This eliminates the duplication of
the APIs of the two modulus types, and the duplication of
`elem_mul` and `elem_mul_`.