Revert the names used in the BoringSSL C/asm code to the names used in
BoringSSL. This substantially reduces the diff between *ring* and
BoringSSL for these files.
Use a variant of BoringSSL's symbol prefixing machinery to semi-
automatically prefix FFI symbols with the `GFp_` prefix. The names aren't
all exactly the same as before, because previously we *replaced* a
symbol's original prefix with the `GFp_` prefix; now we're prepending
`GFp_`. In the future we'll use a different prefix entirely.
This paves the way for using different prefixes for each version so that
multiple versions of *ring* can be linked into an executable at once.
This change adds optional support for
- Armv8.3-A Pointer Authentication (PAuth) and
- Armv8.5-A Branch Target Identification (BTI)
features to the perl scripts.
Both features can be enabled with additional compiler flags.
Unless any of these are enabled explicitly there is no code change at
all.
The extensions are briefly described below. Please read the appropriate
chapters of the Arm Architecture Reference Manual for the complete
specification.
Scope
-----
This change only affects generated assembly code.
Armv8.3-A Pointer Authentication
--------------------------------
Pointer Authentication extension supports the authentication of the
contents of registers before they are used for indirect branching
or load.
PAuth provides a probabilistic method to detect corruption of register
values. PAuth signing instructions generate a Pointer Authentication
Code (PAC) based on the value of a register, a seed and a key.
The generated PAC is inserted into the original value in the register.
A PAuth authentication instruction recomputes the PAC, and if it matches
the PAC in the register, restores its original value. In case of a
mismatch, an architecturally unmapped address is generated instead.
With PAuth, mitigation against ROP (Return-oriented Programming) attacks
can be implemented. This is achieved by signing the contents of the
link-register (LR) before it is pushed to stack. Once LR is popped,
it is authenticated. This way a stack corruption which overwrites the
LR on the stack is detectable.
The PAuth extension adds several new instructions, some of which are not
recognized by older hardware. To support a single codebase for both pre
Armv8.3-A targets and newer ones, only NOP-space instructions are added
by this patch. These instructions are treated as NOPs on hardware
which does not support Armv8.3-A. Furthermore, this patch only considers
cases where LR is saved to the stack and then restored before branching
to its content. There are cases in the code where LR is pushed to stack
but it is not used later. We do not address these cases as they are not
affected by PAuth.
There are two keys available to sign an instruction address: A and B.
PACIASP and PACIBSP only differ in the used keys: A and B, respectively.
The keys are typically managed by the operating system.
To enable generating code for PAuth compile with
-mbranch-protection=<mode>:
- standard or pac-ret: add PACIASP and AUTIASP, also enables BTI
(read below)
- pac-ret+b-key: add PACIBSP and AUTIBSP
Armv8.5-A Branch Target Identification
--------------------------------------
Branch Target Identification features some new instructions which
protect the execution of instructions on guarded pages which are not
intended branch targets.
If Armv8.5-A is supported by the hardware, execution of an instruction
changes the value of PSTATE.BTYPE field. If an indirect branch
lands on a guarded page the target instruction must be one of the
BTI <jc> flavors, or in case of a direct call or jump it can be any
other instruction. If the target instruction is not compatible with the
value of PSTATE.BTYPE a Branch Target Exception is generated.
In short, indirect jumps are compatible with BTI <j> and <jc> while
indirect calls are compatible with BTI <c> and <jc>. Please refer to the
specification for the details.
Armv8.3-A PACIASP and PACIBSP are implicit branch target
identification instructions which are equivalent with BTI c or BTI jc
depending on system register configuration.
BTI is used to mitigate JOP (Jump-oriented Programming) attacks by
limiting the set of instructions which can be jumped to.
BTI requires active linker support to mark the pages with BTI-enabled
code as guarded. For ELF64 files BTI compatibility is recorded in the
.note.gnu.property section. For a shared object or static binary it is
required that all linked units support BTI. This means that even a
single assembly file without the required note section turns-off BTI
for the whole binary or shared object.
The new BTI instructions are treated as NOPs on hardware which does
not support Armv8.5-A or on pages which are not guarded.
To insert this new and optional instruction compile with
-mbranch-protection=standard (also enables PAuth) or +bti.
When targeting a guarded page from a non-guarded page, weaker
compatibility restrictions apply to maintain compatibility between
legacy and new code. For detailed rules please refer to the Arm ARM.
Compiler support
----------------
Compiler support requires understanding '-mbranch-protection=<mode>'
and emitting the appropriate feature macros (__ARM_FEATURE_BTI_DEFAULT
and __ARM_FEATURE_PAC_DEFAULT). The current state is the following:
-------------------------------------------------------
| Compiler | -mbranch-protection | Feature macros |
+----------+---------------------+--------------------+
| clang | 9.0.0 | 11.0.0 |
+----------+---------------------+--------------------+
| gcc | 9 | expected in 10.1+ |
-------------------------------------------------------
Available Platforms
------------------
Arm Fast Model and QEMU support both extensions.
https://developer.arm.com/tools-and-software/simulation-models/fast-modelshttps://www.qemu.org/
Implementation Notes
--------------------
This change adds BTI landing pads even to assembly functions which are
likely to be directly called only. In these cases, landing pads might
be superfluous depending on what code the linker generates.
Code size and performance impact for these cases would be negligble.
Interaction with C code
-----------------------
Pointer Authentication is a per-frame protection while Branch Target
Identification can be turned on and off only for all code pages of a
whole shared object or static binary. Because of these properties if
C/C++ code is compiled without any of the above features but assembly
files support any of them unconditionally there is no incompatibility
between the two.
Useful Links
------------
To fully understand the details of both PAuth and BTI it is advised to
read the related chapters of the Arm Architecture Reference Manual
(Arm ARM):
https://developer.arm.com/documentation/ddi0487/latest/
Additional materials:
"Providing protection for complex software"
https://developer.arm.com/architectures/learn-the-architecture/providing-protection-for-complex-software
Arm Compiler Reference Guide Version 6.14: -mbranch-protection
https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mbranch-protection?lang=en
Arm C Language Extensions (ACLE)
https://developer.arm.com/docs/101028/latest
Change-Id: I4335f92e2ccc8e209c7d68a0a79f1acdf3aeb791
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42084
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
implementation.
*ring* tries to work without type-punning `memcpy`, so the use of that
in `GFp_gcm_ghash_nohw` was replaced by the use of `u64_from_be_bytes`.
This will (I hope) also help with the eventual support for big-endian
targets. Here's the diff from BoringSSL in that function:
```diff
-void gcm_ghash_nohw(uint64_t Xi[2], const u128 Htable[16], const uint8_t *inp,
- size_t len) {
+void GFp_gcm_ghash_nohw(uint64_t Xi[2], const u128 Htable[16], const uint8_t *inp,
+ size_t len) {
uint64_t swapped[2];
swapped[0] = CRYPTO_bswap8(Xi[1]);
swapped[1] = CRYPTO_bswap8(Xi[0]);
while (len >= 16) {
- uint64_t block[2];
- OPENSSL_memcpy(block, inp, 16);
- swapped[0] ^= CRYPTO_bswap8(block[1]);
- swapped[1] ^= CRYPTO_bswap8(block[0]);
+ swapped[0] ^= u64_from_be_bytes(&inp[8]);
+ swapped[1] ^= u64_from_be_bytes(inp);
gcm_polyval_nohw(swapped, &Htable[0]);
inp += 16;
len -= 16;
```
I also had to add a couple of (uint32_t) truncating casts where
BoringSSL expects an implicit truncation to occur, to avoid
`-Werror=conversion`.
During the merge, I found that `GFp_gcm_gmult_clmul` had its
`.cfi_startproc` on the wrong line. I fixed that as part of the merge.
During my review of the BoringSSL changes, I noticed that BoringSSL had
left some of the dead code in ghash-x86_64.pl, which had previously been
removed in *ring*. That removal is being done in BoringSSL in [1].
[1] https://boringssl-review.googlesource.com/c/boringssl/+/41144
This shrinks the perf gap between nohw and 4bit_mmx. Replace 4bit_mmx
and fix the last remaining variable-time GHASH implementation, covering
32-bit x86 without SSSE3.
Before:
Did 2065000 AES-128-GCM (16 bytes) seal operations in 1000154us (2064682.0 ops/sec): 33.0 MB/s
Did 368000 AES-128-GCM (256 bytes) seal operations in 1002435us (367106.1 ops/sec): 94.0 MB/s
Did 77000 AES-128-GCM (1350 bytes) seal operations in 1001225us (76905.8 ops/sec): 103.8 MB/s
Did 14000 AES-128-GCM (8192 bytes) seal operations in 1067523us (13114.5 ops/sec): 107.4 MB/s
Did 6572 AES-128-GCM (16384 bytes) seal operations in 1015976us (6468.7 ops/sec): 106.0 MB/s
After:
Did 1995000 AES-128-GCM (16 bytes) seal operations in 1000374us (1994254.1 ops/sec): 31.9 MB/s
Did 319000 AES-128-GCM (256 bytes) seal operations in 1000196us (318937.5 ops/sec): 81.6 MB/s
Did 66000 AES-128-GCM (1350 bytes) seal operations in 1002823us (65814.2 ops/sec): 88.8 MB/s
Did 12000 AES-128-GCM (8192 bytes) seal operations in 1079294us (11118.4 ops/sec): 91.1 MB/s
Did 5511 AES-128-GCM (16384 bytes) seal operations in 1006218us (5476.9 ops/sec): 89.7 MB/s
(Note fallback AES is dampening the perf hit. Pairing with AESNI to
roughly isolate GHASH shows a 40% hit.)
That just leaves aes_nohw...
Change-Id: I7d842806c54a5a057895fa2e7665633330e34b72
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38784
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
We have several variable-time table-based GHASH implementations, called
"4bit" in the code. We have a fallback one in C and assembly
implementations for x86, x86_64, and armv4. This are used if assembly is
off or if the hardware lacks NEON or SSSE3.
Note these benchmarks are all on hardware several generations beyond
what would actually run this code, so it's a bit artificial.
Implement a constant-time implementation of GHASH based on the notes in
https://bearssl.org/constanttime.html#ghash-for-gcm, as well as the
reduction algorithm described in
https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pdf.
This new implementation is actually faster than the fallback C code for
both 32-bit and 64-bit. It is slower than the assembly implementations,
particularly for 32-bit. I've left 32-bit x86 alone but replaced the
x86_64 and armv4 ones. The perf hit on x86_64 is smaller and affects a
small percentage of 64-bit Chrome on Windows users. ARM chips without
NEON is rare (Chrome for Android requires it), so replace that too.
The answer for 32-bit x86 is unclear. More 32-bit Chrome on Windows
users lack SSSE3, and the perf hit is dramatic. gcm_gmult_4bit_mmx uses
SSE2, so perhaps we can close the gap with an SSE2 version of this
strategy, or perhaps we can decide this perf hit is worth fixing the
timing leaks.
32-bit x86 with OPENSSL_NO_ASM
Before: (4bit C)
Did 1136000 AES-128-GCM (16 bytes) seal operations in 1000762us (1135135.0 ops/sec): 18.2 MB/s
Did 190000 AES-128-GCM (256 bytes) seal operations in 1003533us (189331.1 ops/sec): 48.5 MB/s
Did 40000 AES-128-GCM (1350 bytes) seal operations in 1022114us (39134.6 ops/sec): 52.8 MB/s
Did 7282 AES-128-GCM (8192 bytes) seal operations in 1117575us (6515.9 ops/sec): 53.4 MB/s
Did 3663 AES-128-GCM (16384 bytes) seal operations in 1098538us (3334.4 ops/sec): 54.6 MB/s
After:
Did 1503000 AES-128-GCM (16 bytes) seal operations in 1000054us (1502918.8 ops/sec): 24.0 MB/s
Did 252000 AES-128-GCM (256 bytes) seal operations in 1001173us (251704.8 ops/sec): 64.4 MB/s
Did 53000 AES-128-GCM (1350 bytes) seal operations in 1016983us (52114.9 ops/sec): 70.4 MB/s
Did 9317 AES-128-GCM (8192 bytes) seal operations in 1056367us (8819.9 ops/sec): 72.3 MB/s
Did 4356 AES-128-GCM (16384 bytes) seal operations in 1000445us (4354.1 ops/sec): 71.3 MB/s
64-bit x86 with OPENSSL_NO_ASM
Before: (4bit C)
Did 2976000 AES-128-GCM (16 bytes) seal operations in 1000258us (2975232.4 ops/sec): 47.6 MB/s
Did 510000 AES-128-GCM (256 bytes) seal operations in 1000295us (509849.6 ops/sec): 130.5 MB/s
Did 106000 AES-128-GCM (1350 bytes) seal operations in 1001573us (105833.5 ops/sec): 142.9 MB/s
Did 18000 AES-128-GCM (8192 bytes) seal operations in 1003895us (17930.2 ops/sec): 146.9 MB/s
Did 9000 AES-128-GCM (16384 bytes) seal operations in 1003352us (8969.9 ops/sec): 147.0 MB/s
After:
Did 2972000 AES-128-GCM (16 bytes) seal operations in 1000178us (2971471.1 ops/sec): 47.5 MB/s
Did 515000 AES-128-GCM (256 bytes) seal operations in 1001850us (514049.0 ops/sec): 131.6 MB/s
Did 108000 AES-128-GCM (1350 bytes) seal operations in 1004941us (107469.0 ops/sec): 145.1 MB/s
Did 19000 AES-128-GCM (8192 bytes) seal operations in 1034966us (18358.1 ops/sec): 150.4 MB/s
Did 9250 AES-128-GCM (16384 bytes) seal operations in 1005269us (9201.5 ops/sec): 150.8 MB/s
32-bit ARM without NEON
Before: (4bit armv4 asm)
Did 952000 AES-128-GCM (16 bytes) seal operations in 1001009us (951040.4 ops/sec): 15.2 MB/s
Did 152000 AES-128-GCM (256 bytes) seal operations in 1005576us (151157.1 ops/sec): 38.7 MB/s
Did 32000 AES-128-GCM (1350 bytes) seal operations in 1024522us (31234.1 ops/sec): 42.2 MB/s
Did 5290 AES-128-GCM (8192 bytes) seal operations in 1005335us (5261.9 ops/sec): 43.1 MB/s
Did 2650 AES-128-GCM (16384 bytes) seal operations in 1004396us (2638.4 ops/sec): 43.2 MB/s
After:
Did 540000 AES-128-GCM (16 bytes) seal operations in 1000009us (539995.1 ops/sec): 8.6 MB/s
Did 90000 AES-128-GCM (256 bytes) seal operations in 1000028us (89997.5 ops/sec): 23.0 MB/s
Did 19000 AES-128-GCM (1350 bytes) seal operations in 1022041us (18590.3 ops/sec): 25.1 MB/s
Did 3150 AES-128-GCM (8192 bytes) seal operations in 1003199us (3140.0 ops/sec): 25.7 MB/s
Did 1694 AES-128-GCM (16384 bytes) seal operations in 1076156us (1574.1 ops/sec): 25.8 MB/s
(Note fallback AES is dampening the perf hit.)
64-bit x86 with OPENSSL_ia32cap=0
Before: (4bit x86_64 asm)
Did 2615000 AES-128-GCM (16 bytes) seal operations in 1000220us (2614424.8 ops/sec): 41.8 MB/s
Did 431000 AES-128-GCM (256 bytes) seal operations in 1001250us (430461.9 ops/sec): 110.2 MB/s
Did 89000 AES-128-GCM (1350 bytes) seal operations in 1002209us (88803.8 ops/sec): 119.9 MB/s
Did 16000 AES-128-GCM (8192 bytes) seal operations in 1064535us (15030.0 ops/sec): 123.1 MB/s
Did 8261 AES-128-GCM (16384 bytes) seal operations in 1096787us (7532.0 ops/sec): 123.4 MB/s
After:
Did 2355000 AES-128-GCM (16 bytes) seal operations in 1000096us (2354773.9 ops/sec): 37.7 MB/s
Did 373000 AES-128-GCM (256 bytes) seal operations in 1000981us (372634.4 ops/sec): 95.4 MB/s
Did 77000 AES-128-GCM (1350 bytes) seal operations in 1003557us (76727.1 ops/sec): 103.6 MB/s
Did 13000 AES-128-GCM (8192 bytes) seal operations in 1003058us (12960.4 ops/sec): 106.2 MB/s
Did 7139 AES-128-GCM (16384 bytes) seal operations in 1099576us (6492.5 ops/sec): 106.4 MB/s
(Note fallback AES is dampening the perf hit. Pairing with AESNI to roughly
isolate GHASH shows a 40% hit.)
For comparison, this is what removing gcm_gmult_4bit_mmx would do.
32-bit x86 with OPENSSL_ia32cap=0
Before:
Did 2014000 AES-128-GCM (16 bytes) seal operations in 1000026us (2013947.6 ops/sec): 32.2 MB/s
Did 367000 AES-128-GCM (256 bytes) seal operations in 1000097us (366964.4 ops/sec): 93.9 MB/s
Did 77000 AES-128-GCM (1350 bytes) seal operations in 1002135us (76836.0 ops/sec): 103.7 MB/s
Did 13000 AES-128-GCM (8192 bytes) seal operations in 1011394us (12853.5 ops/sec): 105.3 MB/s
Did 7227 AES-128-GCM (16384 bytes) seal operations in 1099409us (6573.5 ops/sec): 107.7 MB/s
If gcm_gmult_4bit_mmx were replaced:
Did 1350000 AES-128-GCM (16 bytes) seal operations in 1000128us (1349827.2 ops/sec): 21.6 MB/s
Did 219000 AES-128-GCM (256 bytes) seal operations in 1000090us (218980.3 ops/sec): 56.1 MB/s
Did 46000 AES-128-GCM (1350 bytes) seal operations in 1017365us (45214.8 ops/sec): 61.0 MB/s
Did 8393 AES-128-GCM (8192 bytes) seal operations in 1115579us (7523.4 ops/sec): 61.6 MB/s
Did 3840 AES-128-GCM (16384 bytes) seal operations in 1001928us (3832.6 ops/sec): 62.8 MB/s
(Note fallback AES is dampening the perf hit. Pairing with AESNI to roughly
isolate GHASH shows a 73% hit. gcm_gmult_4bit_mmx is almost 4x as faster.)
Change-Id: Ib28c981e92e200b17fb9ddc89aef695ac6733a43
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38724
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
The assembly dispatch tests currently assume NDEBUG is consistently
defined between C/C++ and assembly. While this is usually the case for
UNIX, CMake does not pass NDEBUG to NASM. This is giving gRPC some
difficulties in updating BoringSSL, so switch it to an opt-in
-DBORINGSSL_DISPATCH_TEST flag instead.
Update-Note: If you were copying NDEBUG over to assembly files, that's
no longer required (though it's harmless to leave it in). If you want to
run ImplDispatchTest.*, build both C/C++ and assembly with
-DBORINGSSL_DISPATCH_TEST in your debug builds. (Don't enable it in
release builds. It causes assembly to scribble in some globals.)
Change-Id: I9ab3371dc0f0a40b27b44ef93835e007a6346900
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/37764
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
When running the ARM perlasm files on Windows, close STDOUT fails. There
appears to be some weird quirk on Windows when one replaces STDOUT with
a pipe. The x86_64.pl files all avoid this by opening OUT and then
setting *STDOUT=*OUT. Align all the ARM files with that pattern.
See https://ci.appveyor.com/project/conscrypt/conscrypt
Change-Id: Ibee9427a05d806f7f23a6d9817394cfabf2f534a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/37324
Reviewed-by: Kenny Root <kroot@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
If the xlate filter script fails, the outer script swallows the error,
unless we check the return value of close.
Change-Id: Ib506bb745a5d27b9d1df9329535bf81ad090f41f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35724
Reviewed-by: Adam Langley <agl@google.com>
This makes AES-GCM always constant-time on aarch64 (provided assembly is
enabled). Unlike vpaes, this does come at a binary size penalty of 1K
compared to the gcm_*_4bit version.
ABI testing already covered by GCMTest.ABI (GHASH_ASM_ARM covers both
OPENSSL_ARM and OPENSSL_AARCH64.)
Cortex-A53 (Raspberry Pi 3 Model B+)
Before:
Did 274000 AES-128-GCM (16 bytes) seal operations in 1003461us (273055.0 ops/sec): 4.4 MB/s
Did 53000 AES-128-GCM (256 bytes) seal operations in 1007689us (52595.6 ops/sec): 13.5 MB/s
Did 12000 AES-128-GCM (1350 bytes) seal operations in 1075908us (11153.4 ops/sec): 15.1 MB/s
Did 2068 AES-128-GCM (8192 bytes) seal operations in 1089037us (1898.9 ops/sec): 15.6 MB/s
After:
Did 298000 AES-128-GCM (16 bytes) seal operations in 1002917us (297133.3 ops/sec): 4.8 MB/s
Did 64000 AES-128-GCM (256 bytes) seal operations in 1001124us (63928.1 ops/sec): 16.4 MB/s
Did 14000 AES-128-GCM (1350 bytes) seal operations in 1015477us (13786.6 ops/sec): 18.6 MB/s
Did 2497 AES-128-GCM (8192 bytes) seal operations in 1057951us (2360.2 ops/sec): 19.3 MB/s
Bug: 265
Change-Id: I251bf0f2eae0578580bb14192755e5d8ff64cd14
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35285
Reviewed-by: Adam Langley <agl@google.com>
This imports ce5eb5e8149d8d03660575f4b8504c993851988a and
1212818eb07add297fe562eba80ac46a9893781e from OpenSSL's 1.1.1 branch.
Change-Id: I121c0771371697191a163a28d972a7b3cee37762
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35164
Reviewed-by: Adam Langley <agl@google.com>
The 64-bit version can be fairly straightforwardly translated.
Ironically, this makes 32-bit x86 the first architecture to meet the
goal of constant-time AES-GCM given SIMD assembly. (Though x86_64 could
join by simply giving up on bsaes...)
Bug: 263
Change-Id: Icb2cec936457fac7132bbb5dbb094433bc14b86e
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35024
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Rename some GCM assembly functions so that all functions that do the
same thing the same way have the same name, to make the dispatching
logic simpler.
Thread CPU feature caching witnesses through the GCM dispatching logic
to make feature detection less error-prone.
Start an internal Rust API for feature detection.
We currently require clmul instructions for constant-time GHASH
on x86_64. Otherwise, it falls back to a variable-time 4-bit table
implementation. However, a significant proportion of clients lack these
instructions.
Inspired by vpaes, we can use pshufb and a slightly different order of
incorporating the bits to make a constant-time GHASH. This requires
SSSE3, which is very common. Benchmarking old machines we had on hand,
it appears to be a no-op on Sandy Bridge and a small slowdown for
Penryn.
Sandy Bridge (Intel Pentium CPU 987 @ 1.50GHz):
(Note: these numbers are before 16-byte-aligning the table. That was an
improvement on Penryn, so it's possible Sandy Bridge is now better.)
Before:
Did 4244750 AES-128-GCM (16 bytes) seal operations in 4015000us (1057222.9 ops/sec): 16.9 MB/s
Did 442000 AES-128-GCM (1350 bytes) seal operations in 4016000us (110059.8 ops/sec): 148.6 MB/s
Did 84000 AES-128-GCM (8192 bytes) seal operations in 4015000us (20921.5 ops/sec): 171.4 MB/s
Did 3349250 AES-256-GCM (16 bytes) seal operations in 4016000us (833976.6 ops/sec): 13.3 MB/s
Did 343500 AES-256-GCM (1350 bytes) seal operations in 4016000us (85532.9 ops/sec): 115.5 MB/s
Did 65250 AES-256-GCM (8192 bytes) seal operations in 4015000us (16251.6 ops/sec): 133.1 MB/s
After:
Did 4229250 AES-128-GCM (16 bytes) seal operations in 4016000us (1053100.1 ops/sec): 16.8 MB/s [-0.4%]
Did 442250 AES-128-GCM (1350 bytes) seal operations in 4016000us (110122.0 ops/sec): 148.7 MB/s [+0.1%]
Did 83500 AES-128-GCM (8192 bytes) seal operations in 4015000us (20797.0 ops/sec): 170.4 MB/s [-0.6%]
Did 3286500 AES-256-GCM (16 bytes) seal operations in 4016000us (818351.6 ops/sec): 13.1 MB/s [-1.9%]
Did 342750 AES-256-GCM (1350 bytes) seal operations in 4015000us (85367.4 ops/sec): 115.2 MB/s [-0.2%]
Did 65250 AES-256-GCM (8192 bytes) seal operations in 4016000us (16247.5 ops/sec): 133.1 MB/s [-0.0%]
Penryn (Intel Core 2 Duo CPU P8600 @ 2.40GHz):
Before:
Did 1179000 AES-128-GCM (16 bytes) seal operations in 1000139us (1178836.1 ops/sec): 18.9 MB/s
Did 97000 AES-128-GCM (1350 bytes) seal operations in 1006347us (96388.2 ops/sec): 130.1 MB/s
Did 18000 AES-128-GCM (8192 bytes) seal operations in 1028943us (17493.7 ops/sec): 143.3 MB/s
Did 977000 AES-256-GCM (16 bytes) seal operations in 1000197us (976807.6 ops/sec): 15.6 MB/s
Did 82000 AES-256-GCM (1350 bytes) seal operations in 1012434us (80992.9 ops/sec): 109.3 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1006528us (14902.7 ops/sec): 122.1 MB/s
After:
Did 1306000 AES-128-GCM (16 bytes) seal operations in 1000153us (1305800.2 ops/sec): 20.9 MB/s [+10.8%]
Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009852us (93082.9 ops/sec): 125.7 MB/s [-3.4%]
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1012096us (16796.8 ops/sec): 137.6 MB/s [-4.0%]
Did 1070000 AES-256-GCM (16 bytes) seal operations in 1000929us (1069006.9 ops/sec): 17.1 MB/s [+9.4%]
Did 79000 AES-256-GCM (1350 bytes) seal operations in 1002209us (78825.9 ops/sec): 106.4 MB/s [-2.7%]
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1061489us (14131.1 ops/sec): 115.8 MB/s [-5.2%]
Change-Id: I1c3760a77af7bee4aee3745d1c648d9e34594afb
Reviewed-on: https://boringssl-review.googlesource.com/c/34267
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
The first attempt involved using Linux's support for hardware
breakpoints to detect when assembly code was run. However, this doesn't
work with SDE, which is a problem.
This version has the assembly code update a global flags variable when
it's run, but only in non-FIPS and non-debug builds.
Update-Note: Assembly files now pay attention to the NDEBUG preprocessor
symbol. Ensure the build passes the symbol in. (If release builds fail
to link due to missing BORINGSSL_function_hit, this is the cause.)
Change-Id: I6b7ced442b7a77d0b4ae148b00c351f68af89a6e
Reviewed-on: https://boringssl-review.googlesource.com/c/33384
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
This will ensure that this code is tested in CI and is being compiled
by MSVC; previously this C code wasn't being tested at all because all
platforms we use for testing were taking other code paths.
Change-Id: If28096e677104c6109e31e31a636fee82ef4ba11
Reviewed-on: https://boringssl-review.googlesource.com/c/34266
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Since *ring* does not support AES with 192-bit keys, we can remove some
unused assembly code.
Comments are added to indicate that 192-bit key support was willfully
removed.
This extends the work done in commits
1103cf29dfbbf51f0dd8fb757084caa052863869 and
b3e91be71edde28f5d2884d3c3c34260b6a79378.
I agree to license my contributions to each file under the terms given
at the top of each file I changed.
This change syncs several assembly files from upstream. The only meanful
additions are more CFI directives.
Change-Id: I6aec50b6fddbea297b79bae22cfd68d5c115220f
Reviewed-on: https://boringssl-review.googlesource.com/30364
Reviewed-by: Adam Langley <agl@google.com>
Merge all of these at once:
e2ff2ca0dcda4f37d9675f5d64add4a0ca239af9
ae96383af375d52f30f72554b75272fa226ca795
b9940a649afba6666b9dcea38911203c661981de
8da59555c6d6f11c3f22f8c76f09b057786f657a
f03cdc3a936a4e4f00cd8fcf978ce195db3e717e
3763cbeb6a04c0fd9915ac6606cbf0ac4d4263f5
0a3663a64f00b6337ec80d78c8945f2c77c63dba
Some of these changes had previously been merged from upstream OpenSSL
into *ring* so it's much easier to do a merge of all of these at once
to sort out the real differences.
(Imported from upstream's 753316232243ccbf86b96c1c51ffcb41651d9ad5.)
Just to sync up a bit further.
Change-Id: I805150d0f0c10d68648fae83603b0d46231ae4ec
Reviewed-on: https://boringssl-review.googlesource.com/27685
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
These files are otherwise up-to-date with OpenSSL master as of
50ea9d2b3521467a11559be41dcf05ee05feabd6, modulo a couple of spelling
fixes which I've imported.
I've also reverted the same-line label and instruction patch to
x86_64-mont*.pl. The new delocate parser handles that fine.
Change-Id: Ife35c671a8104c3cc2fb6c5a03127376fccc4402
Reviewed-on: https://boringssl-review.googlesource.com/25644
Reviewed-by: Adam Langley <agl@google.com>
ARMv8 kindly deprecated most of its IT instructions in Thumb mode.
These files are taken from upstream and are used on both ARMv7 and ARMv8
processors. Accordingly, silence the warnings by marking the file as
targetting ARMv7. In other files, they were accidentally silenced anyway
by way of the existing .arch lines.
This can be reproduced by building with the new NDK and passing
-DCMAKE_ASM_FLAGS=-march=armv8-a. Some of our downstream code ends up
passing that to the assembly.
Note this change does not attempt to arrange for ARMv8-A/T32 to get
code which honors the constraints. It only silences the warnings and
continues to give it the same ARMv7-A/Thumb-2 code that backwards
compatibility dictates it continue to run.
Bug: chromium:575886, b/63131949
Change-Id: I24ce0b695942eaac799347922b243353b43ad7df
Reviewed-on: https://boringssl-review.googlesource.com/24166
Reviewed-by: Adam Langley <agl@google.com>