yggdrasil/ring - ring - alnyan's gitea

Author	SHA1	Message	Date
Brian Smith	7886603cee	Use some variant of "ring core" instead of "GFp" as a prefix for everything. "GFp_" isn't in the code at all anymore.	2021-05-02 22:09:07 -07:00
Brian Smith	384f7d056b	Replace manual FFI symbol prefixing with automatic symbol prefixing. Revert the names used in the BoringSSL C/asm code to the names used in BoringSSL. This substantially reduces the diff between ring and BoringSSL for these files. Use a variant of BoringSSL's symbol prefixing machinery to semi- automatically prefix FFI symbols with the `GFp_` prefix. The names aren't all exactly the same as before, because previously we replaced a symbol's original prefix with the `GFp_` prefix; now we're prepending `GFp_`. In the future we'll use a different prefix entirely. This paves the way for using different prefixes for each version so that multiple versions of ring can be linked into an executable at once.	2021-05-02 22:09:07 -07:00
Brian Smith	feb692a355	Merge BoringSSL b67732a: aarch64: Remove some flavour conditionals	2020-12-01 12:30:01 -08:00
Tamas Petz	b67732a163	aarch64: Remove some flavour conditionals This change is expected to be a non-functional change. Original request: https://boringssl-review.googlesource.com/c/boringssl/+/42084 Change-Id: Ifbf85eb6cafebabf0cf063b7dd147417d01c280c Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/43584 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2020-10-27 17:10:21 +00:00
Brian Smith	6e500fe853	Merge BoringSSL a0b49d6: aarch64: support BTI and pointer authentication in assembly.	2020-10-19 19:54:32 -07:00
Tamas Petz	a0b49d63fd	aarch64: support BTI and pointer authentication in assembly This change adds optional support for - Armv8.3-A Pointer Authentication (PAuth) and - Armv8.5-A Branch Target Identification (BTI) features to the perl scripts. Both features can be enabled with additional compiler flags. Unless any of these are enabled explicitly there is no code change at all. The extensions are briefly described below. Please read the appropriate chapters of the Arm Architecture Reference Manual for the complete specification. Scope ----- This change only affects generated assembly code. Armv8.3-A Pointer Authentication -------------------------------- Pointer Authentication extension supports the authentication of the contents of registers before they are used for indirect branching or load. PAuth provides a probabilistic method to detect corruption of register values. PAuth signing instructions generate a Pointer Authentication Code (PAC) based on the value of a register, a seed and a key. The generated PAC is inserted into the original value in the register. A PAuth authentication instruction recomputes the PAC, and if it matches the PAC in the register, restores its original value. In case of a mismatch, an architecturally unmapped address is generated instead. With PAuth, mitigation against ROP (Return-oriented Programming) attacks can be implemented. This is achieved by signing the contents of the link-register (LR) before it is pushed to stack. Once LR is popped, it is authenticated. This way a stack corruption which overwrites the LR on the stack is detectable. The PAuth extension adds several new instructions, some of which are not recognized by older hardware. To support a single codebase for both pre Armv8.3-A targets and newer ones, only NOP-space instructions are added by this patch. These instructions are treated as NOPs on hardware which does not support Armv8.3-A. Furthermore, this patch only considers cases where LR is saved to the stack and then restored before branching to its content. There are cases in the code where LR is pushed to stack but it is not used later. We do not address these cases as they are not affected by PAuth. There are two keys available to sign an instruction address: A and B. PACIASP and PACIBSP only differ in the used keys: A and B, respectively. The keys are typically managed by the operating system. To enable generating code for PAuth compile with -mbranch-protection=<mode>: - standard or pac-ret: add PACIASP and AUTIASP, also enables BTI (read below) - pac-ret+b-key: add PACIBSP and AUTIBSP Armv8.5-A Branch Target Identification -------------------------------------- Branch Target Identification features some new instructions which protect the execution of instructions on guarded pages which are not intended branch targets. If Armv8.5-A is supported by the hardware, execution of an instruction changes the value of PSTATE.BTYPE field. If an indirect branch lands on a guarded page the target instruction must be one of the BTI <jc> flavors, or in case of a direct call or jump it can be any other instruction. If the target instruction is not compatible with the value of PSTATE.BTYPE a Branch Target Exception is generated. In short, indirect jumps are compatible with BTI <j> and <jc> while indirect calls are compatible with BTI <c> and <jc>. Please refer to the specification for the details. Armv8.3-A PACIASP and PACIBSP are implicit branch target identification instructions which are equivalent with BTI c or BTI jc depending on system register configuration. BTI is used to mitigate JOP (Jump-oriented Programming) attacks by limiting the set of instructions which can be jumped to. BTI requires active linker support to mark the pages with BTI-enabled code as guarded. For ELF64 files BTI compatibility is recorded in the .note.gnu.property section. For a shared object or static binary it is required that all linked units support BTI. This means that even a single assembly file without the required note section turns-off BTI for the whole binary or shared object. The new BTI instructions are treated as NOPs on hardware which does not support Armv8.5-A or on pages which are not guarded. To insert this new and optional instruction compile with -mbranch-protection=standard (also enables PAuth) or +bti. When targeting a guarded page from a non-guarded page, weaker compatibility restrictions apply to maintain compatibility between legacy and new code. For detailed rules please refer to the Arm ARM. Compiler support ---------------- Compiler support requires understanding '-mbranch-protection=<mode>' and emitting the appropriate feature macros (__ARM_FEATURE_BTI_DEFAULT and __ARM_FEATURE_PAC_DEFAULT). The current state is the following: ------------------------------------------------------- \| Compiler \| -mbranch-protection \| Feature macros \| +----------+---------------------+--------------------+ \| clang \| 9.0.0 \| 11.0.0 \| +----------+---------------------+--------------------+ \| gcc \| 9 \| expected in 10.1+ \| ------------------------------------------------------- Available Platforms ------------------ Arm Fast Model and QEMU support both extensions. https://developer.arm.com/tools-and-software/simulation-models/fast-models https://www.qemu.org/ Implementation Notes -------------------- This change adds BTI landing pads even to assembly functions which are likely to be directly called only. In these cases, landing pads might be superfluous depending on what code the linker generates. Code size and performance impact for these cases would be negligble. Interaction with C code ----------------------- Pointer Authentication is a per-frame protection while Branch Target Identification can be turned on and off only for all code pages of a whole shared object or static binary. Because of these properties if C/C++ code is compiled without any of the above features but assembly files support any of them unconditionally there is no incompatibility between the two. Useful Links ------------ To fully understand the details of both PAuth and BTI it is advised to read the related chapters of the Arm Architecture Reference Manual (Arm ARM): https://developer.arm.com/documentation/ddi0487/latest/ Additional materials: "Providing protection for complex software" https://developer.arm.com/architectures/learn-the-architecture/providing-protection-for-complex-software Arm Compiler Reference Guide Version 6.14: -mbranch-protection https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mbranch-protection?lang=en Arm C Language Extensions (ACLE) https://developer.arm.com/docs/101028/latest Change-Id: I4335f92e2ccc8e209c7d68a0a79f1acdf3aeb791 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42084 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2020-08-11 23:45:04 +00:00
Brian Smith	7fdc0c43ed	Merge BoringSSL dc06e32: Remove unused code from ghash-x86_64.pl.	2020-06-10 12:31:50 -05:00
Brian Smith	d3cab43a4a	Merge BoringSSL 9855c1c: Add a constant-time fallback GHASH implementation. ring tries to work without type-punning `memcpy`, so the use of that in `GFp_gcm_ghash_nohw` was replaced by the use of `u64_from_be_bytes`. This will (I hope) also help with the eventual support for big-endian targets. Here's the diff from BoringSSL in that function: ```diff -void gcm_ghash_nohw(uint64_t Xi[2], const u128 Htable[16], const uint8_t inp, - size_t len) { +void GFp_gcm_ghash_nohw(uint64_t Xi[2], const u128 Htable[16], const uint8_t inp, + size_t len) { uint64_t swapped[2]; swapped[0] = CRYPTO_bswap8(Xi[1]); swapped[1] = CRYPTO_bswap8(Xi[0]); while (len >= 16) { - uint64_t block[2]; - OPENSSL_memcpy(block, inp, 16); - swapped[0] ^= CRYPTO_bswap8(block[1]); - swapped[1] ^= CRYPTO_bswap8(block[0]); + swapped[0] ^= u64_from_be_bytes(&inp[8]); + swapped[1] ^= u64_from_be_bytes(inp); gcm_polyval_nohw(swapped, &Htable[0]); inp += 16; len -= 16; ``` I also had to add a couple of (uint32_t) truncating casts where BoringSSL expects an implicit truncation to occur, to avoid `-Werror=conversion`. During the merge, I found that `GFp_gcm_gmult_clmul` had its `.cfi_startproc` on the wrong line. I fixed that as part of the merge. During my review of the BoringSSL changes, I noticed that BoringSSL had left some of the dead code in ghash-x86_64.pl, which had previously been removed in ring. That removal is being done in BoringSSL in [1]. [1] https://boringssl-review.googlesource.com/c/boringssl/+/41144	2020-05-04 10:54:19 -05:00
David Benjamin	dc06e320d8	Remove unused code from ghash-x86_64.pl. Thanks to Brian Smith for pointing these out in https://boringssl-review.googlesource.com/c/boringssl/+/38724. Change-Id: I715da0778346fcc45aab19855050e18fe95a9185 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/41144 Reviewed-by: Steven Valdez <svaldez@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2020-05-01 16:01:21 +00:00
Brian Smith	6f8d89072a	Merge BoringSSL d041f11: Fix cross-compile of Android on Windows.	2020-01-28 12:50:05 -06:00
David Benjamin	a2518dd077	Vectorize gcm_mul32_nohw and replace gcm_gmult_4bit_mmx. This shrinks the perf gap between nohw and 4bit_mmx. Replace 4bit_mmx and fix the last remaining variable-time GHASH implementation, covering 32-bit x86 without SSSE3. Before: Did 2065000 AES-128-GCM (16 bytes) seal operations in 1000154us (2064682.0 ops/sec): 33.0 MB/s Did 368000 AES-128-GCM (256 bytes) seal operations in 1002435us (367106.1 ops/sec): 94.0 MB/s Did 77000 AES-128-GCM (1350 bytes) seal operations in 1001225us (76905.8 ops/sec): 103.8 MB/s Did 14000 AES-128-GCM (8192 bytes) seal operations in 1067523us (13114.5 ops/sec): 107.4 MB/s Did 6572 AES-128-GCM (16384 bytes) seal operations in 1015976us (6468.7 ops/sec): 106.0 MB/s After: Did 1995000 AES-128-GCM (16 bytes) seal operations in 1000374us (1994254.1 ops/sec): 31.9 MB/s Did 319000 AES-128-GCM (256 bytes) seal operations in 1000196us (318937.5 ops/sec): 81.6 MB/s Did 66000 AES-128-GCM (1350 bytes) seal operations in 1002823us (65814.2 ops/sec): 88.8 MB/s Did 12000 AES-128-GCM (8192 bytes) seal operations in 1079294us (11118.4 ops/sec): 91.1 MB/s Did 5511 AES-128-GCM (16384 bytes) seal operations in 1006218us (5476.9 ops/sec): 89.7 MB/s (Note fallback AES is dampening the perf hit. Pairing with AESNI to roughly isolate GHASH shows a 40% hit.) That just leaves aes_nohw... Change-Id: I7d842806c54a5a057895fa2e7665633330e34b72 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38784 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2019-11-12 01:01:38 +00:00
David Benjamin	9855c1c59a	Add a constant-time fallback GHASH implementation. We have several variable-time table-based GHASH implementations, called "4bit" in the code. We have a fallback one in C and assembly implementations for x86, x86_64, and armv4. This are used if assembly is off or if the hardware lacks NEON or SSSE3. Note these benchmarks are all on hardware several generations beyond what would actually run this code, so it's a bit artificial. Implement a constant-time implementation of GHASH based on the notes in https://bearssl.org/constanttime.html#ghash-for-gcm, as well as the reduction algorithm described in https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pdf. This new implementation is actually faster than the fallback C code for both 32-bit and 64-bit. It is slower than the assembly implementations, particularly for 32-bit. I've left 32-bit x86 alone but replaced the x86_64 and armv4 ones. The perf hit on x86_64 is smaller and affects a small percentage of 64-bit Chrome on Windows users. ARM chips without NEON is rare (Chrome for Android requires it), so replace that too. The answer for 32-bit x86 is unclear. More 32-bit Chrome on Windows users lack SSSE3, and the perf hit is dramatic. gcm_gmult_4bit_mmx uses SSE2, so perhaps we can close the gap with an SSE2 version of this strategy, or perhaps we can decide this perf hit is worth fixing the timing leaks. 32-bit x86 with OPENSSL_NO_ASM Before: (4bit C) Did 1136000 AES-128-GCM (16 bytes) seal operations in 1000762us (1135135.0 ops/sec): 18.2 MB/s Did 190000 AES-128-GCM (256 bytes) seal operations in 1003533us (189331.1 ops/sec): 48.5 MB/s Did 40000 AES-128-GCM (1350 bytes) seal operations in 1022114us (39134.6 ops/sec): 52.8 MB/s Did 7282 AES-128-GCM (8192 bytes) seal operations in 1117575us (6515.9 ops/sec): 53.4 MB/s Did 3663 AES-128-GCM (16384 bytes) seal operations in 1098538us (3334.4 ops/sec): 54.6 MB/s After: Did 1503000 AES-128-GCM (16 bytes) seal operations in 1000054us (1502918.8 ops/sec): 24.0 MB/s Did 252000 AES-128-GCM (256 bytes) seal operations in 1001173us (251704.8 ops/sec): 64.4 MB/s Did 53000 AES-128-GCM (1350 bytes) seal operations in 1016983us (52114.9 ops/sec): 70.4 MB/s Did 9317 AES-128-GCM (8192 bytes) seal operations in 1056367us (8819.9 ops/sec): 72.3 MB/s Did 4356 AES-128-GCM (16384 bytes) seal operations in 1000445us (4354.1 ops/sec): 71.3 MB/s 64-bit x86 with OPENSSL_NO_ASM Before: (4bit C) Did 2976000 AES-128-GCM (16 bytes) seal operations in 1000258us (2975232.4 ops/sec): 47.6 MB/s Did 510000 AES-128-GCM (256 bytes) seal operations in 1000295us (509849.6 ops/sec): 130.5 MB/s Did 106000 AES-128-GCM (1350 bytes) seal operations in 1001573us (105833.5 ops/sec): 142.9 MB/s Did 18000 AES-128-GCM (8192 bytes) seal operations in 1003895us (17930.2 ops/sec): 146.9 MB/s Did 9000 AES-128-GCM (16384 bytes) seal operations in 1003352us (8969.9 ops/sec): 147.0 MB/s After: Did 2972000 AES-128-GCM (16 bytes) seal operations in 1000178us (2971471.1 ops/sec): 47.5 MB/s Did 515000 AES-128-GCM (256 bytes) seal operations in 1001850us (514049.0 ops/sec): 131.6 MB/s Did 108000 AES-128-GCM (1350 bytes) seal operations in 1004941us (107469.0 ops/sec): 145.1 MB/s Did 19000 AES-128-GCM (8192 bytes) seal operations in 1034966us (18358.1 ops/sec): 150.4 MB/s Did 9250 AES-128-GCM (16384 bytes) seal operations in 1005269us (9201.5 ops/sec): 150.8 MB/s 32-bit ARM without NEON Before: (4bit armv4 asm) Did 952000 AES-128-GCM (16 bytes) seal operations in 1001009us (951040.4 ops/sec): 15.2 MB/s Did 152000 AES-128-GCM (256 bytes) seal operations in 1005576us (151157.1 ops/sec): 38.7 MB/s Did 32000 AES-128-GCM (1350 bytes) seal operations in 1024522us (31234.1 ops/sec): 42.2 MB/s Did 5290 AES-128-GCM (8192 bytes) seal operations in 1005335us (5261.9 ops/sec): 43.1 MB/s Did 2650 AES-128-GCM (16384 bytes) seal operations in 1004396us (2638.4 ops/sec): 43.2 MB/s After: Did 540000 AES-128-GCM (16 bytes) seal operations in 1000009us (539995.1 ops/sec): 8.6 MB/s Did 90000 AES-128-GCM (256 bytes) seal operations in 1000028us (89997.5 ops/sec): 23.0 MB/s Did 19000 AES-128-GCM (1350 bytes) seal operations in 1022041us (18590.3 ops/sec): 25.1 MB/s Did 3150 AES-128-GCM (8192 bytes) seal operations in 1003199us (3140.0 ops/sec): 25.7 MB/s Did 1694 AES-128-GCM (16384 bytes) seal operations in 1076156us (1574.1 ops/sec): 25.8 MB/s (Note fallback AES is dampening the perf hit.) 64-bit x86 with OPENSSL_ia32cap=0 Before: (4bit x86_64 asm) Did 2615000 AES-128-GCM (16 bytes) seal operations in 1000220us (2614424.8 ops/sec): 41.8 MB/s Did 431000 AES-128-GCM (256 bytes) seal operations in 1001250us (430461.9 ops/sec): 110.2 MB/s Did 89000 AES-128-GCM (1350 bytes) seal operations in 1002209us (88803.8 ops/sec): 119.9 MB/s Did 16000 AES-128-GCM (8192 bytes) seal operations in 1064535us (15030.0 ops/sec): 123.1 MB/s Did 8261 AES-128-GCM (16384 bytes) seal operations in 1096787us (7532.0 ops/sec): 123.4 MB/s After: Did 2355000 AES-128-GCM (16 bytes) seal operations in 1000096us (2354773.9 ops/sec): 37.7 MB/s Did 373000 AES-128-GCM (256 bytes) seal operations in 1000981us (372634.4 ops/sec): 95.4 MB/s Did 77000 AES-128-GCM (1350 bytes) seal operations in 1003557us (76727.1 ops/sec): 103.6 MB/s Did 13000 AES-128-GCM (8192 bytes) seal operations in 1003058us (12960.4 ops/sec): 106.2 MB/s Did 7139 AES-128-GCM (16384 bytes) seal operations in 1099576us (6492.5 ops/sec): 106.4 MB/s (Note fallback AES is dampening the perf hit. Pairing with AESNI to roughly isolate GHASH shows a 40% hit.) For comparison, this is what removing gcm_gmult_4bit_mmx would do. 32-bit x86 with OPENSSL_ia32cap=0 Before: Did 2014000 AES-128-GCM (16 bytes) seal operations in 1000026us (2013947.6 ops/sec): 32.2 MB/s Did 367000 AES-128-GCM (256 bytes) seal operations in 1000097us (366964.4 ops/sec): 93.9 MB/s Did 77000 AES-128-GCM (1350 bytes) seal operations in 1002135us (76836.0 ops/sec): 103.7 MB/s Did 13000 AES-128-GCM (8192 bytes) seal operations in 1011394us (12853.5 ops/sec): 105.3 MB/s Did 7227 AES-128-GCM (16384 bytes) seal operations in 1099409us (6573.5 ops/sec): 107.7 MB/s If gcm_gmult_4bit_mmx were replaced: Did 1350000 AES-128-GCM (16 bytes) seal operations in 1000128us (1349827.2 ops/sec): 21.6 MB/s Did 219000 AES-128-GCM (256 bytes) seal operations in 1000090us (218980.3 ops/sec): 56.1 MB/s Did 46000 AES-128-GCM (1350 bytes) seal operations in 1017365us (45214.8 ops/sec): 61.0 MB/s Did 8393 AES-128-GCM (8192 bytes) seal operations in 1115579us (7523.4 ops/sec): 61.6 MB/s Did 3840 AES-128-GCM (16384 bytes) seal operations in 1001928us (3832.6 ops/sec): 62.8 MB/s (Note fallback AES is dampening the perf hit. Pairing with AESNI to roughly isolate GHASH shows a 73% hit. gcm_gmult_4bit_mmx is almost 4x as faster.) Change-Id: Ib28c981e92e200b17fb9ddc89aef695ac6733a43 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38724 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-11-12 00:27:02 +00:00
David Benjamin	0de64a749b	Make the dispatch tests opt-in. The assembly dispatch tests currently assume NDEBUG is consistently defined between C/C++ and assembly. While this is usually the case for UNIX, CMake does not pass NDEBUG to NASM. This is giving gRPC some difficulties in updating BoringSSL, so switch it to an opt-in -DBORINGSSL_DISPATCH_TEST flag instead. Update-Note: If you were copying NDEBUG over to assembly files, that's no longer required (though it's harmless to leave it in). If you want to run ImplDispatchTest.*, build both C/C++ and assembly with -DBORINGSSL_DISPATCH_TEST in your debug builds. (Don't enable it in release builds. It causes assembly to scribble in some globals.) Change-Id: I9ab3371dc0f0a40b27b44ef93835e007a6346900 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/37764 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-09-27 19:02:43 +00:00
David Benjamin	d041f11134	Fix cross-compile of Android on Windows. When running the ARM perlasm files on Windows, close STDOUT fails. There appears to be some weird quirk on Windows when one replaces STDOUT with a pipe. The x86_64.pl files all avoid this by opening OUT and then setting STDOUT=OUT. Align all the ARM files with that pattern. See https://ci.appveyor.com/project/conscrypt/conscrypt Change-Id: Ibee9427a05d806f7f23a6d9817394cfabf2f534a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/37324 Reviewed-by: Kenny Root <kroot@google.com> Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-09-04 17:20:44 +00:00
Brian Smith	eb93d699e8	Remove redundant GCM code.	2019-07-12 18:59:23 -10:00
Brian Smith	b3f9a918e5	Enable NEON fallback implementation of GCM on AAarch64.	2019-07-02 16:13:32 -10:00
Brian Smith	88596b8d33	Merge BoringSSL c1d8c5b: Handle errors from close in perlasm scripts.	2019-07-02 10:00:48 -10:00
Brian Smith	0c21917a7f	Merge BoringSSL d22578f: Adapt gcm_*_neon to aarch64.	2019-07-01 14:48:16 -10:00
Brian Smith	d1e9b5ba3a	Take BoringSSL 8d685ec: modes/asm/ghash-armv4.pl: address "infixes are deprecated" warnings.	2019-07-01 14:35:41 -10:00
David Benjamin	c1d8c5b0e0	Handle errors from close in perlasm scripts. If the xlate filter script fails, the outer script swallows the error, unless we check the return value of close. Change-Id: Ib506bb745a5d27b9d1df9329535bf81ad090f41f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35724 Reviewed-by: Adam Langley <agl@google.com>	2019-04-26 18:03:21 +00:00
David Benjamin	d22578f366	Adapt gcm__neon to aarch64. This makes AES-GCM always constant-time on aarch64 (provided assembly is enabled). Unlike vpaes, this does come at a binary size penalty of 1K compared to the gcm__4bit version. ABI testing already covered by GCMTest.ABI (GHASH_ASM_ARM covers both OPENSSL_ARM and OPENSSL_AARCH64.) Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 274000 AES-128-GCM (16 bytes) seal operations in 1003461us (273055.0 ops/sec): 4.4 MB/s Did 53000 AES-128-GCM (256 bytes) seal operations in 1007689us (52595.6 ops/sec): 13.5 MB/s Did 12000 AES-128-GCM (1350 bytes) seal operations in 1075908us (11153.4 ops/sec): 15.1 MB/s Did 2068 AES-128-GCM (8192 bytes) seal operations in 1089037us (1898.9 ops/sec): 15.6 MB/s After: Did 298000 AES-128-GCM (16 bytes) seal operations in 1002917us (297133.3 ops/sec): 4.8 MB/s Did 64000 AES-128-GCM (256 bytes) seal operations in 1001124us (63928.1 ops/sec): 16.4 MB/s Did 14000 AES-128-GCM (1350 bytes) seal operations in 1015477us (13786.6 ops/sec): 18.6 MB/s Did 2497 AES-128-GCM (8192 bytes) seal operations in 1057951us (2360.2 ops/sec): 19.3 MB/s Bug: 265 Change-Id: I251bf0f2eae0578580bb14192755e5d8ff64cd14 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35285 Reviewed-by: Adam Langley <agl@google.com>	2019-03-14 21:43:27 +00:00
David Benjamin	8d685ec867	modes/asm/ghash-armv4.pl: address "infixes are deprecated" warnings. This imports ce5eb5e8149d8d03660575f4b8504c993851988a and 1212818eb07add297fe562eba80ac46a9893781e from OpenSSL's 1.1.1 branch. Change-Id: I121c0771371697191a163a28d972a7b3cee37762 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35164 Reviewed-by: Adam Langley <agl@google.com>	2019-03-05 17:52:28 +00:00
David Benjamin	5ce12e6436	Add a 32-bit SSSE3 GHASH implementation. The 64-bit version can be fairly straightforwardly translated. Ironically, this makes 32-bit x86 the first architecture to meet the goal of constant-time AES-GCM given SIMD assembly. (Though x86_64 could join by simply giving up on bsaes...) Bug: 263 Change-Id: Icb2cec936457fac7132bbb5dbb094433bc14b86e Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35024 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-04 19:02:52 +00:00
Brian Smith	4af426b0f3	Merge non-test parts of BoringSSL 73b1f18: Add ABI tests for GCM.	2019-02-11 15:29:09 -10:00
David Benjamin	0a67eba62d	Fix the order of Windows unwind codes. The unwind tester suggests Windows doesn't care, but the documentation says that unwind codes should be sorted in descending offset, which means the last instruction should be first. https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2017#struct-unwind_code Bug: 259 Change-Id: I21e54c362e18e0405f980005112cc3f7c417c70c Reviewed-on: https://boringssl-review.googlesource.com/c/34785 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-05 19:38:23 +00:00
Brian Smith	3dfbe3bf6b	Do GCM CPU feature detection in Rust. Rename some GCM assembly functions so that all functions that do the same thing the same way have the same name, to make the dispatching logic simpler. Thread CPU feature caching witnesses through the GCM dispatching logic to make feature detection less error-prone. Start an internal Rust API for feature detection.	2019-01-28 14:33:31 -10:00
David Benjamin	4545503926	Add a constant-time pshufb-based GHASH implementation. We currently require clmul instructions for constant-time GHASH on x86_64. Otherwise, it falls back to a variable-time 4-bit table implementation. However, a significant proportion of clients lack these instructions. Inspired by vpaes, we can use pshufb and a slightly different order of incorporating the bits to make a constant-time GHASH. This requires SSSE3, which is very common. Benchmarking old machines we had on hand, it appears to be a no-op on Sandy Bridge and a small slowdown for Penryn. Sandy Bridge (Intel Pentium CPU 987 @ 1.50GHz): (Note: these numbers are before 16-byte-aligning the table. That was an improvement on Penryn, so it's possible Sandy Bridge is now better.) Before: Did 4244750 AES-128-GCM (16 bytes) seal operations in 4015000us (1057222.9 ops/sec): 16.9 MB/s Did 442000 AES-128-GCM (1350 bytes) seal operations in 4016000us (110059.8 ops/sec): 148.6 MB/s Did 84000 AES-128-GCM (8192 bytes) seal operations in 4015000us (20921.5 ops/sec): 171.4 MB/s Did 3349250 AES-256-GCM (16 bytes) seal operations in 4016000us (833976.6 ops/sec): 13.3 MB/s Did 343500 AES-256-GCM (1350 bytes) seal operations in 4016000us (85532.9 ops/sec): 115.5 MB/s Did 65250 AES-256-GCM (8192 bytes) seal operations in 4015000us (16251.6 ops/sec): 133.1 MB/s After: Did 4229250 AES-128-GCM (16 bytes) seal operations in 4016000us (1053100.1 ops/sec): 16.8 MB/s [-0.4%] Did 442250 AES-128-GCM (1350 bytes) seal operations in 4016000us (110122.0 ops/sec): 148.7 MB/s [+0.1%] Did 83500 AES-128-GCM (8192 bytes) seal operations in 4015000us (20797.0 ops/sec): 170.4 MB/s [-0.6%] Did 3286500 AES-256-GCM (16 bytes) seal operations in 4016000us (818351.6 ops/sec): 13.1 MB/s [-1.9%] Did 342750 AES-256-GCM (1350 bytes) seal operations in 4015000us (85367.4 ops/sec): 115.2 MB/s [-0.2%] Did 65250 AES-256-GCM (8192 bytes) seal operations in 4016000us (16247.5 ops/sec): 133.1 MB/s [-0.0%] Penryn (Intel Core 2 Duo CPU P8600 @ 2.40GHz): Before: Did 1179000 AES-128-GCM (16 bytes) seal operations in 1000139us (1178836.1 ops/sec): 18.9 MB/s Did 97000 AES-128-GCM (1350 bytes) seal operations in 1006347us (96388.2 ops/sec): 130.1 MB/s Did 18000 AES-128-GCM (8192 bytes) seal operations in 1028943us (17493.7 ops/sec): 143.3 MB/s Did 977000 AES-256-GCM (16 bytes) seal operations in 1000197us (976807.6 ops/sec): 15.6 MB/s Did 82000 AES-256-GCM (1350 bytes) seal operations in 1012434us (80992.9 ops/sec): 109.3 MB/s Did 15000 AES-256-GCM (8192 bytes) seal operations in 1006528us (14902.7 ops/sec): 122.1 MB/s After: Did 1306000 AES-128-GCM (16 bytes) seal operations in 1000153us (1305800.2 ops/sec): 20.9 MB/s [+10.8%] Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009852us (93082.9 ops/sec): 125.7 MB/s [-3.4%] Did 17000 AES-128-GCM (8192 bytes) seal operations in 1012096us (16796.8 ops/sec): 137.6 MB/s [-4.0%] Did 1070000 AES-256-GCM (16 bytes) seal operations in 1000929us (1069006.9 ops/sec): 17.1 MB/s [+9.4%] Did 79000 AES-256-GCM (1350 bytes) seal operations in 1002209us (78825.9 ops/sec): 106.4 MB/s [-2.7%] Did 15000 AES-256-GCM (8192 bytes) seal operations in 1061489us (14131.1 ops/sec): 115.8 MB/s [-5.2%] Change-Id: I1c3760a77af7bee4aee3745d1c648d9e34594afb Reviewed-on: https://boringssl-review.googlesource.com/c/34267 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-24 17:19:21 +00:00
Adam Langley	c1615719ce	Add test of assembly code dispatch. The first attempt involved using Linux's support for hardware breakpoints to detect when assembly code was run. However, this doesn't work with SDE, which is a problem. This version has the assembly code update a global flags variable when it's run, but only in non-FIPS and non-debug builds. Update-Note: Assembly files now pay attention to the NDEBUG preprocessor symbol. Ensure the build passes the symbol in. (If release builds fail to link due to missing BORINGSSL_function_hit, this is the cause.) Change-Id: I6b7ced442b7a77d0b4ae148b00c351f68af89a6e Reviewed-on: https://boringssl-review.googlesource.com/c/33384 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2019-01-22 20:22:53 +00:00
Brian Smith	0cd9bf6f64	Use C instead of assembly fallback code in GCM on X86_64. This will ensure that this code is tested in CI and is being compiled by MSVC; previously this C code wasn't being tested at all because all platforms we use for testing were taking other code paths.	2019-01-18 12:40:41 -10:00
David Benjamin	73b1f181b6	Add ABI tests for GCM. Change-Id: If28096e677104c6109e31e31a636fee82ef4ba11 Reviewed-on: https://boringssl-review.googlesource.com/c/34266 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-15 22:49:37 +00:00
Adam Langley	b5ee32ff4b	Cherry-pick BoringSSL 6410e18: Update several assembly files from upstream. I'm not sure why this was skipped when it came up in the merge queue the first time. :(	2019-01-04 14:35:44 -10:00
Brian Smith	b9b2e57a59	Remove dead code in aesni-gcm-x86_64.pl.	2018-11-27 00:56:43 -10:00
Dylan MacKenzie	b5153f0e38	Remove 192-bit key support from AES-NI assembly. Since ring does not support AES with 192-bit keys, we can remove some unused assembly code. Comments are added to indicate that 192-bit key support was willfully removed. This extends the work done in commits 1103cf29dfbbf51f0dd8fb757084caa052863869 and b3e91be71edde28f5d2884d3c3c34260b6a79378. I agree to license my contributions to each file under the terms given at the top of each file I changed.	2018-10-24 20:00:56 -10:00
Adam Langley	6410e18e91	Update several assembly files from upstream. This change syncs several assembly files from upstream. The only meanful additions are more CFI directives. Change-Id: I6aec50b6fddbea297b79bae22cfd68d5c115220f Reviewed-on: https://boringssl-review.googlesource.com/30364 Reviewed-by: Adam Langley <agl@google.com>	2018-08-07 18:57:17 +00:00
Brian Smith	ee13e738a2	Remove `gcm_gmult_avx`. `gcm_gmult_avx` is just a stub that jumps to `gmult_clmul`. Remove it and replace the usages of it with calls to `gmult_clmul`.	2018-05-17 13:40:06 -10:00
Brian Smith	ea2ffe4409	Take BoringSSL 06d467c: ghashv8-armx.pl: add Qualcomm Kryo results.	2018-05-11 16:19:11 -10:00
Brian Smith	544c62465e	Take BoringSSL a7c8f2b: ghashv8-armvx.pl: Fix various typos.	2018-05-11 16:18:52 -10:00
Brian Smith	8b7aab1f01	Merge BoringSSL 6dc9942: Sync up some perlasm license headers and easy fixes.	2018-05-11 09:22:10 -10:00
Brian Smith	faec78525f	Take BoringSSL 875095a: Silence ARMv8 deprecated IT instruction warnings.	2018-05-01 07:58:29 -10:00
Brian Smith	cc01f0c839	Merge BoringSSL 4358f10: Remove clang assembler .arch workaround.	2018-05-01 07:54:48 -10:00
Brian Smith	3cd39b12a8	Merge BoringSSL 0967853: Add CFI start/end for _aesni_ctr32[_ghash]_6x	2018-04-30 14:04:18 -10:00
Brian Smith	ab5feaa912	Merge BoringSSL ee2c1f3: aesni-gcm-x86_64.pl: sync CFI directives from upstream.	2018-04-30 14:03:57 -10:00
Brian Smith	cde1e66f9a	Merge BoringSSL d4e3795: x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.	2018-04-29 09:15:58 -10:00
Brian Smith	da15550ca6	Merge BoringSSL 7f7ef53..0a3663a. Merge all of these at once: e2ff2ca0dcda4f37d9675f5d64add4a0ca239af9 ae96383af375d52f30f72554b75272fa226ca795 b9940a649afba6666b9dcea38911203c661981de 8da59555c6d6f11c3f22f8c76f09b057786f657a f03cdc3a936a4e4f00cd8fcf978ce195db3e717e 3763cbeb6a04c0fd9915ac6606cbf0ac4d4263f5 0a3663a64f00b6337ec80d78c8945f2c77c63dba Some of these changes had previously been merged from upstream OpenSSL into ring so it's much easier to do a merge of all of these at once to sort out the real differences.	2018-04-28 17:40:15 -10:00
Brian Smith	5cdd83f01e	Merge BoringSSL 583c12e: Remove filename argument to x86 asm_init.	2018-04-28 16:07:06 -10:00
David Benjamin	06d467c58a	ghashv8-armx.pl: add Qualcomm Kryo results. (Imported from upstream's 753316232243ccbf86b96c1c51ffcb41651d9ad5.) Just to sync up a bit further. Change-Id: I805150d0f0c10d68648fae83603b0d46231ae4ec Reviewed-on: https://boringssl-review.googlesource.com/27685 Commit-Queue: Steven Valdez <svaldez@google.com> Reviewed-by: Steven Valdez <svaldez@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-04-24 19:48:59 +00:00
David Benjamin	a7c8f2b7b0	ghashv8-armvx.pl: Fix various typos. (Imported from upstream's 46f4e1bec51dc96fa275c168752aa34359d9ee51.) Change-Id: Ie9c1e9cfc38a3962e3674a68bc0174d064272fc2 Reviewed-on: https://boringssl-review.googlesource.com/27684 Commit-Queue: Steven Valdez <svaldez@google.com> Reviewed-by: Steven Valdez <svaldez@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-04-24 19:48:49 +00:00
David Benjamin	6dc994265e	Sync up some perlasm license headers and easy fixes. These files are otherwise up-to-date with OpenSSL master as of 50ea9d2b3521467a11559be41dcf05ee05feabd6, modulo a couple of spelling fixes which I've imported. I've also reverted the same-line label and instruction patch to x86_64-mont*.pl. The new delocate parser handles that fine. Change-Id: Ife35c671a8104c3cc2fb6c5a03127376fccc4402 Reviewed-on: https://boringssl-review.googlesource.com/25644 Reviewed-by: Adam Langley <agl@google.com>	2018-02-11 01:00:35 +00:00
Brian Smith	affdca5d1c	Merge BoringSSL 0648129: Move modes/ into the FIPS module.	2018-01-09 16:56:49 -10:00
David Benjamin	875095aa7c	Silence ARMv8 deprecated IT instruction warnings. ARMv8 kindly deprecated most of its IT instructions in Thumb mode. These files are taken from upstream and are used on both ARMv7 and ARMv8 processors. Accordingly, silence the warnings by marking the file as targetting ARMv7. In other files, they were accidentally silenced anyway by way of the existing .arch lines. This can be reproduced by building with the new NDK and passing -DCMAKE_ASM_FLAGS=-march=armv8-a. Some of our downstream code ends up passing that to the assembly. Note this change does not attempt to arrange for ARMv8-A/T32 to get code which honors the constraints. It only silences the warnings and continues to give it the same ARMv7-A/Thumb-2 code that backwards compatibility dictates it continue to run. Bug: chromium:575886, b/63131949 Change-Id: I24ce0b695942eaac799347922b243353b43ad7df Reviewed-on: https://boringssl-review.googlesource.com/24166 Reviewed-by: Adam Langley <agl@google.com>	2017-12-14 01:56:22 +00:00

1 2

62 Commits