Note: I originally tried an alternative implementation using `flat_map` that
ended up being materially slower. To fix that performance regression I had to
make the following change:
```
let mut output = Output([0; MAX_OUTPUT_LEN]);
output
.0
- .iter_mut()
- .zip(input.iter().copied().flat_map(|Wrapping(w)| f(w)))
+ .chunks_mut(N)
+ .zip(input.iter().copied().map(|Wrapping(w)| f(w)))
.for_each(|(o, i)| {
- *o = i;
+ o.copy_from_slice(&i);
});
output
}
```
I verified that this generates the same assembly code as the original code
on x86-64 using Rust 1.74.0, except that there are two additional 128-bit
moves in `sha256_formta_output` to zero out the latter half of `Output`,
which was intended.
Clarify how the math works, and use a slightly better trade-off of
doubling vs squaring. On 64-bit targets RSA verification is now
less than 10% faster. On 32-bit targets its over 20% faster. I
expect that we can improve the performance further by optimizing
the doubling implementation.
Also the new implementation avoids allocating/cloning any temporary
`Elem`s, unlike the previous implementation.
Save two private-modulus Montgomery multiplications per RSA exponentiation
at the cost of approximately two modulus-wide XORs.
The new new `oneR()` is extracted from the Montgomery RR setup.
Remove the use of `One<RR>` in `elem_exp_consttime`.
Eliminate one modular doubling in Montgomery RR setup. This saves one
public modulus modular doubling per RSA signature verification, at the
cost of approximately one public-modulus-wide XOR. RsaKeyPair also sees
similar savings per Modulus.
This Cargo feature treats a user-provided `getrandom` implementation as
a secure random number generator (`SecureRandom`). The feature only has
effect on targets not supported by `getrandom`.
I agree to license my contributions to each file under the terms given
at the top of each file I changed.
Values for P-521 have an odd number of limbs in 32-bit mode, which
means we can't keep using `TOBN`, and also Montgomery-encoded
values are different for 32-bit and 64-bit.
Generate some of the C boilerplate, particularly the large constants.
The output is written into target/curves/, and can be merged into
the actual code in crypto/fipsmodule/ec/ using a two-way merge tool;
this is the same as the Rust code generation.
Changes to gfp_p{256,384}.c are due to differences in the generator's
output:
* The generator doesn't generate trailing commas in arrays.
* The generator consistently avoids adding leading zeros to hex
constants, and consistently format values less than 10 in decimal;
the exiting code used a mix of styles.
* The generator wraps arrays consistently; the existing code used a
mix of wrapping styles.
* The generator does not nest constants in the functions that need
them. This was changed to support future refactorings.
Previously we swapped p and q and calcualted a new qInv if p < q so
that we could avoid doing a redunction during the CRT computation.
Instead, just do the reduction during CRT as it's cheap. This
notably reduces the number of operations we need in `bigint`, and
it eliminates the need for the `Prime` modulus marker type.
Now there are more things that can go wrong during CRT. First, we
may wrongly forget to reduce m_2 mod p; before this wasn't necessary
since every element of q was an element of p. Next, we may wrongly
use the the value of m_2 mod p instead of m_2 later; before we could
do this since previously m_2 mod p == m_2 since m_2 < q < p. Add
tests for these cases.
Rewrite the tests for `elem_reduced_once` given its new constraints.
QQ comprised almost 25% of the bulk of RsaKeyPair and is actually
completely unnecessary since `elem_reduced` can do the whole
reduction itself.
This has the nice and important side effect of eliminating some
conversion operations between `bigint` types.
This is also a step towards eliminating some of the `unsafe trait`
stuff that kinda-but-not-really modeled modulus relationships.
Move all the checks that are done for each private prime into
the `PrivatePrime` constructor, to eliminate duplication.
This causes the 512-bit-ness check to be done earlier than before,
which affects some of the tests..
Originally we only had `Modulus`. Then we had a need for a
temporary `Modulus` without `oneRR` so we created `PartialModulus`.
However, there is really nothing "partial" about them. So, improve
the naming by renaming `PartialModulus` to `Modulus` and `Modulus`
to `OwnedModulusWithOne`. In the future we may refactor things
further to separate the ownership aspect from the "has oneRR"
aspect.
Instead of just doing a straightforward rename, take this
opportunity to refactor the code so that it uses the new `Modulus`
whenever `oneRR()` isn't used. This eliminates the duplication of
the APIs of the two modulus types, and the duplication of
`elem_mul` and `elem_mul_`.
The original idea of `Width` was that we'd support operatings that
worked on multiple same-width but different-modulus values, and/or
we'd support splitting a 2N-limb `BoxedLimb` into two N-limb
`&[Limb]`, etc. However, as things are now, `Width` doesn't really
serve a useful purpose.
When we added `rsa::PublicKey` we changed the `ring::signature` RSA
implementation to construct an `rsa::PublicKey` and then verify the
signature using it. Unfortunately for backward compatibility with old
uses of `RsaKeyPair`, `rsa::PublicKey` constructor constructs (and
allocates) a copy of the ASN.1-serialized public key. This is not
acceptable for users who are using `ring::signature` to verify a
single signature. Refactor `PublicKey` so that it can be bypassed
by the `ring::signature` implementation.
This is a step towards implementing allocation-free RSA signature
verification.
Align with the other use of `OPENSSL_memcpy` in `curve25519_64_adx.h`.
`string.h` will no longer be needed.
Signed-off-by: Jiaqi Gao <jiaqi.gao@intel.com>
For now, just put `#[allow(...)]` directives in the places where the
conversions are done. We'll follow up in the future with the correct
replacement for `as` for each case, as several PRs.
Do it because BoringSSL does it. BoringSSL has some other headers it
includes here but we intentionally do not have them and/or we
intentionally do not include them here (string.h and assert.h).
When this benchmark was imported from crypto-bench to *ring* and
ported from the libtest `#[bench]` framework to Criterion.rs, we
kept the macro-based structure from the original benchmarks. However,
Criterion.rs actually supports the kind of parameterized benchmarking
we do much more naturally, and so we don't need the macros. Get rid of
them.
Also remove distinction between TLS 1.2 and TLS 1.3 AAD. These
benchmarks were originally written long ago when the TLS 1.3 draft
specified a different AAD format.
I hope this will serve as a better example of how to write such
benchmarks than it previously did.
Add a new scalar base point multiplication test case generator that
where the points are *not* Montgomery-encoded. This way we don't need
to generate different test data files when the Montgomery encoding
for a curve isn't the same for 32-bit and 64-bit targets (P-521).
This version of the generator produces the test cases for all the
scalars that the current P-256 and P-384 tests generate, in the same
format; the only exception is that the point is not
Montgomery-encoded.
When I generated these test vectors, I gave all of them the same point: the
generator of the curve. Consequently these input files are 100% redundant
with the `point_mul_base.txt` input files. So just remove them and use the
`point_mul_base.txt` files instead.