Diff: Hardware Specification

@@ -1,12 +1,12 @@
1	1	<div style="font-size: 0.85em; color: #656d76; margin-bottom: 1em; padding: 0.5em; background: #f6f8fa; border-radius: 4px;">
2		-📄 Source: <a href="https://github.com/chipsalliance/caliptra-rtl/blob/35b0bc5691b2bd0fc180403914cfabe207379089/docs/CaliptraHardwareSpecification.md" target="_blank">chipsalliance/caliptra-rtl/docs/CaliptraHardwareSpecification.md</a> @ <code>35b0bc5</code>
	2	+📄 Source: <a href="https://github.com/chipsalliance/caliptra-rtl/blob/8016178f1f699c59a8e1465d59079203b7ce49b0/docs/CaliptraHardwareSpecification.md" target="_blank">chipsalliance/caliptra-rtl/docs/CaliptraHardwareSpecification.md</a> @ <code>8016178</code>
3	3	</div>
4	4
5	5	![OCP Logo](../images/caliptra-rtl/docs/images/OCP_logo.png)
6	6
7	7	<p style="text-align: center;">Caliptra Hardware Specification</p>
8	8
9		-<p style="text-align: center;">Revision 2.0.3</p>
	9	+<p style="text-align: center;">Revision 2.1</p>
10	10
11	11	<div style="page-break-after: always"></div>
12	12
@@ -28,10 +28,10 @@
28	28	* Caliptra uC may use internally in mailbox mode or via the Caliptra AXI DMA assist engine in streaming mode
29	29	* SHA Accelerator adds new SHA save/restore functionality
30	30	* Adams Bridge Dilithium/ML-DSA (refer to [Adams bridge spec](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md))
31		-* Subsystem mode support (refer to [Subsystem Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/Caliptra%202.0%20Subsystem%20Specification%201.pdf) for details)
	31	+* Subsystem mode support (refer to [Subsystem Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSIntegrationSpecification.md) for details)
32	32	* ECDH hardware support
33	33	* HMAC512 hardware support
34		- * AXI Manager with DMA support (refer to [DMA Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSHardwareSpecification.md#caliptra-axi-manager--dma-assist))
	34	+ * AXI Manager with DMA support (refer to [DMA Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSHardwareSpecification.md#caliptra-core-axi-manager--dma-assist))
35	35	* Manufacturing and Debug Unlock
36	36	* UDS programming
37	37	* Read logic for Secret Fuses
@@ -39,29 +39,38 @@
39	39	* RISC-V core PMP support
40	40	* CSR HMAC key for manufacturing flow
41	41
	42	+## Key Caliptra 2.1 Changes
	43	+* AXI Manager DMA AES feature for OCP L.O.C.K. support (refer to [DMA Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSHardwareSpecification.md#caliptra-core-axi-manager--dma-assist))
	44	+* [AES Big Endian mode](#aes-endian)
	45	+* [External Staging Area](./CaliptraIntegrationSpecification.md#external-staging-area)
	46	+* [OCP LOCK Support](#ocp-lock-hardware-architecture)
	47	+* [SHA3](#sha3)
	48	+* [ML-KEM](#adams-bridge-kyber-ml-kem)
	49	+
	50	+
42	51	## Boot FSM
43	52
44	53	The Boot FSM detects that the SoC is bringing Caliptra out of reset. Part of this flow involves signaling to the SoC that Caliptra is awake and ready for fuses. After fuses are populated and the SoC indicates that it is done downloading fuses, Caliptra can wake up the rest of the IP by de-asserting the internal reset.
45	54
46		-The following figure shows the initial power-on arc of the Mailbox Boot FSM.
47		-
48		-Figure 1: Mailbox Boot FSM state diagram
49		-
50		-![](../images/caliptra-rtl/docs/images/HW_mbox_boot_fsm.png)
	55	+The following figure shows the state transitions and associated actions in Caliptra's boot state machine.
	56	+
	57	+Figure: Caliptra Boot FSM state diagram
	58	+
	59	+![](../images/caliptra-rtl/docs/images/Caliptra_boot_fsm.png)
51	60
52	61	The Boot FSM first waits for the SoC to assert cptra\_pwrgood and de-assert cptra\_rst\_b. In the BOOT\_FUSE state, Caliptra signals to the SoC that it is ready for fuses. After the SoC is done writing fuses, it sets the fuse done register and the FSM advances to BOOT\_DONE.
53	62
54		-BOOT\_DONE enables Caliptra reset de-assertion through a two flip-flop synchronizer.
55		-
56		-## FW update reset (Impactless FW update)
57		-
58		-When a firmware update is initiated, Runtime FW writes to fw\_update\_reset register to trigger the FW update reset. When this register is written, only the RISC-V core is reset using cptra\_uc\_fw\_rst\_b pin and all AHB targets are still active. All registers within the targets and ICCM/DCCM memories are intact after the reset. Reset is deasserted synchronously after a programmable number of cycles; the minimum allowed number of wait cycles is 5, which is also the default configured value. Reset de-assertion is done through a two flip-flop synchronizer. Since ICCM is locked during runtime, the boot FSM unlocks it when the RISC-V reset is asserted. Following FW update reset deassertion, normal boot flow updates the ICCM with the new FW from the mailbox SRAM. The boot flow is modified as shown in the following figure.
59		-
60		-Figure 2: Mailbox Boot FSM state diagram for FW update reset
61		-
62		-![](../images/caliptra-rtl/docs/images/mbox_boot_fsm_FW_update_reset.png)
	63	+Once in the BOOT\_DONE state, Caliptra de-asserts resets through a two flip-flop synchronizer.
	64	+
	65	+### FW update reset (Impactless FW update)
	66	+
	67	+When a firmware update is initiated, Runtime FW writes to fw\_update\_reset register to trigger the FW update reset. When this register is written, only the RISC-V core is reset using cptra\_uc\_rst\_b pin and all AHB targets are still active. All registers within the targets and ICCM/DCCM memories are intact after the reset. Reset is deasserted synchronously after a programmable number of cycles; the minimum allowed number of wait cycles is 5, which is also the default configured value. Reset de-assertion is done through a two flip-flop synchronizer. Since ICCM is locked during runtime, the boot FSM unlocks it when the RISC-V reset is asserted. Following FW update reset deassertion, normal boot flow updates the ICCM with the new FW from the mailbox SRAM.
63	68
64	69	Impactless firmware updates may be initiated by writing to the fw\_update\_reset register after Caliptra comes out of global reset and enters the BOOT\_DONE state. In the BOOT\_FWRST state, only the reset to the RISC-V core is asserted and the wait timer is initialized. After the timer expires, the FSM advances from the BOOT\_WAIT to BOOT\_DONE state where the reset is deasserted and ICCM is unlocked.
	70	+
	71	+### Breakpoints for Debug
	72	+
	73	+Integrators may connect a breakpoint input to Caliptra, which is intended to connect to a chip GPIO pin. When asserted, this pin causes the Caliptra boot FSM to follow a modified arc. Instead of transitioning immediately to the BOOT_DONE state upon completion of fuse programming, the state machine transitions from BOOT_FUSE to BOOT_WAIT. Here, the state machine halts until the Caliptra register [CPTRA_BOOTFSM_GO](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.soc_ifc_reg.CPTRA_BOOTFSM_GO) is set, either by AXI or TAP access.
65	74
66	75	## RISC-V core
67	76
@@ -116,8 +125,9 @@
116	125	\| Data Vault \| 5 \| 8 KiB \| 0x1001_C000 \| 0x1001_DFFF \|
117	126	\| SHA512 \| 6 \| 32 KiB \| 0x1002_0000 \| 0x1002_7FFF \|
118	127	\| SHA256 \| 10 \| 32 KiB \| 0x1002_8000 \| 0x1002_FFFF \|
119		-\| ML-DSA \| 14 \| 64 KiB \| 0x1003_0000 \| 0x1003_FFFF \|
	128	+\| ABR (MLDSA/MLKEM) \| 14 \| 64 KiB \| 0x1003_0000 \| 0x1003_FFFF \|
120	129	\| AES \| 15 \| 4 KiB \| 0x1001_1000 \| 0x1001_1FFF \|
	130	+\| SHA3 \| 16 \| 4 KiB \| 0x1004_0000 \| 0x1004_0FFF \|
121	131
122	132
123	133	#### Peripherals subsystem
@@ -196,8 +206,8 @@
196	206	\| Mailbox (Notifications) \| 20 \| 7 \|
197	207	\| SHA512 Accelerator (Errors) \| 23 \| 8 \|
198	208	\| SHA512 Accelerator (Notifications) \| 24 \| 7 \|
199		-\| MLDSA (Errors) \| 23 \| 8 \|
200		-\| MLDSA (Notifications) \| 24 \| 7 \|
	209	+\| ABR (MLDSA/MLKEM) (Errors) \| 23 \| 8 \|
	210	+\| ABR (MLDSA/MLKEM) (Notifications) \| 24 \| 7 \|
201	211	\| AXI DMA (Errors) \| 25 \| 8 \|
202	212	\| AXI DMA (Notifications) \| 26 \| 7 \|
203	213
@@ -220,7 +230,7 @@
220	230
221	231	The following figure shows the two timers.
222	232
223		-Figure 3: Caliptra Watchdog Timer
	233	+Figure: Caliptra Watchdog Timer
224	234
225	235	![](../images/caliptra-rtl/docs/images/WDT.png)
226	236
@@ -358,7 +368,7 @@
358	368
359	369	The following figure shows the timing information for clock gating.
360	370
361		-Figure 10: Clock gating timing
	371	+Figure: Clock gating timing
362	372
363	373	![](../images/caliptra-rtl/docs/images/clock_gating_timing.png)
364	374
@@ -372,19 +382,19 @@
372	382
373	383	The following figure shows the integrated TRNG block.
374	384
375		-Figure 11: Integrated TRNG block
	385	+Figure: Integrated TRNG block
376	386
377	387	![](../images/caliptra-rtl/docs/images/integrated_TRNG.png)
378	388
379	389	The following figure shows the CSRNG block.
380	390
381		-Figure 12: CSRNG block
	391	+Figure: CSRNG block
382	392
383	393	![](../images/caliptra-rtl/docs/images/CSRNG_block.png)
384	394
385	395	The following figure shows the entropy source block.
386	396
387		-Figure 13: Entropy source block
	397	+Figure: Entropy source block
388	398
389	399	![](../images/caliptra-rtl/docs/images/entropy_source_block.png)
390	400
@@ -450,7 +460,7 @@
450	460
451	461	The following figure shows the top level signals defined in caliptra\_top.
452	462
453		-Figure 14: caliptra\_top signals
	463	+Figure: caliptra\_top signals
454	464
455	465	![](../images/caliptra-rtl/docs/images/caliptra_top_signals.png)
456	466
@@ -472,7 +482,7 @@
472	482
473	483	The following figure shows the entropy source signals.
474	484
475		-Figure 15: Entropy source signals
	485	+Figure: Entropy source signals
476	486
477	487	![](../images/caliptra-rtl/docs/images/entropy_source_signals.png)
478	488
@@ -634,7 +644,54 @@
634	644
635	645	Note: If the debug security state switches to debug mode anytime, the security assets and keys are still flushed even though JTAG is not open.
636	646
637		-Figure 16: JTAG implementation
	647	+The following table details the alias addresses for registers in soc ifc that are accessible through JTAG.
	648	+Debug Locked registers are a subset of registers accessible when debug intent is set, when debug is unlocked, or the lifecycle state is DEVICE_MANUFACTURING.
	649	+Debug Unlocked registers are accessible when debug is unlocked, or the lifecycle state is DEVICE_MANUFACTURING.
	650	+
	651	+\| Register Name \| JTAG Address \| Accessibility \| Debug Locked \| Debug Unlocked \|
	652	+\| ------------------------------------------- \| -------------- \| --------------- \| -------------- \| ---------------- \|
	653	+\| mbox_lock \| 7’h75 \| RO \| YES \| YES \|
	654	+\| mbox_cmd \| 7’h76 \| RW \| YES \| YES \|
	655	+\| mbox_dlen \| 7’h50 \| RW \| YES \| YES \|
	656	+\| mbox_dataout \| 7’h51 \| RO \| YES \| YES \|
	657	+\| mbox_datain \| 7’h62 \| WO \| YES \| YES \|
	658	+\| mbox_status \| 7’h52 \| RW \| YES \| YES \|
	659	+\| mbox_execute \| 7’h77 \| WO \| YES \| YES \|
	660	+\| CPTRA_BOOT_STATUS \| 7’h53 \| RO \| YES \| YES \|
	661	+\| CPTRA_HW_ERRROR_ENC \| 7’h54 \| RO \| YES \| YES \|
	662	+\| CPTRA_FW_ERROR_ENC \| 7’h55 \| RO \| YES \| YES \|
	663	+\| SS_UDS_SEED_BASE_ADDR_L \| 7’h56 \| RO \|\| YES \|
	664	+\| SS_UDS_SEED_BASE_ADDR_H \| 7’h57 \| RO \|\| YES \|
	665	+\| CPTRA_HW_ERROR_FATAL \| 7’h58 \| RO \| YES \| YES \|
	666	+\| CPTRA_FW_ERROR_FATAL \| 7’h59 \| RO \| YES \| YES \|
	667	+\| CPTRA_HW_ERROR_NON_FATAL \| 7’h5a \| RO \| YES \| YES \|
	668	+\| CPTRA_FW_ERROR_NON_FATAL \| 7’h5b \| RO \| YES \| YES \|
	669	+\| CPTRA_DBG_MANUF_SERVICE_REG \| 7’h60 \| RW \| YES \| YES \|
	670	+\| CPTRA_BOOTFSM_GO \| 7’h61 \| RW \| YES \| YES \|
	671	+\| SS_DEBUG_INTENT \| 7’h63 \| RW \|\| YES \|
	672	+\| SS_CALIPTRA_BASE_ADDR_L \| 7’h64 \| RW \|\| YES \|
	673	+\| SS_CALIPTRA_BASE_ADDR_H \| 7’h65 \| RW \|\| YES \|
	674	+\| SS_MCI_BASE_ADDR_L \| 7’h66 \| RW \|\| YES \|
	675	+\| SS_MCI_BASE_ADDR_H \| 7’h67 \| RW \|\| YES \|
	676	+\| SS_RECOVERY_IFC_BASE_ADDR_L \| 7’h68 \| RW \|\| YES \|
	677	+\| SS_RECOVERY_IFC_BASE_ADDR_H \| 7’h69 \| RW \|\| YES \|
	678	+\| SS_OTP_FC_BASE_ADDR_L \| 7’h6A \| RW \|\| YES \|
	679	+\| SS_OTP_FC_BASE_ADDR_H \| 7’h6B \| RW \|\| YES \|
	680	+\| SS_STRAP_GENERIC_0 \| 7’h6C \| RW \|\| YES \|
	681	+\| SS_STRAP_GENERIC_1 \| 7’h6D \| RW \|\| YES \|
	682	+\| SS_STRAP_GENERIC_2 \| 7’h6E \| RW \|\| YES \|
	683	+\| SS_STRAP_GENERIC_3 \| 7’h6F \| RW \|\| YES \|
	684	+\| SS_DBG_SERVICE_REG_REQ \| 7’h70 \| RW \| YES \| YES \|
	685	+\| SS_DBG_SERVICE_REG_RSP \| 7’h71 \| RO \| YES \| YES \|
	686	+\| SS_DBG_UNLOCK_LEVEL0 \| 7’h72 \| RW \|\| YES \|
	687	+\| SS_DBG_UNLOCK_LEVEL1 \| 7’h73 \| RW \|\| YES \|
	688	+\| SS_STRAP_CALIPTRA_DMA_AXI_USER \| 7’h74 \| RW \|\| YES \|
	689	+\| SS_EXTERNAL_STAGING_AREA_BASE_ADDR_L \| 7’h78 \| RW \|\| YES \|
	690	+\| SS_EXTERNAL_STAGING_AREA_BASE_ADDR_H \| 7’h79 \| RW \|\| YES \|
	691	+
	692	+
	693	+
	694	+Figure: JTAG implementation
638	695
639	696	![](../images/caliptra-rtl/docs/images/JTAG_implementation.png)
640	697
@@ -644,18 +701,19 @@
644	701
645	702	* Symmetric cryptographic primitives
646	703	* De-obfuscation engine
647		- * SHA512/384 (based on NIST FIPS 180-4 [2])
648		- * SHA256 (based on NIST FIPS 180-4 [2])
649		- * HMAC512 (based on [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5] and [RFC 4868](https://tools.ietf.org/html/rfc4868) [6])
	704	+ * SHA512/384 (based on NIST FIPS 180-4 [2])
	705	+ * SHA256 (based on NIST FIPS 180-4 [2])
	706	+ * HMAC512 (based on [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5] and [RFC 4868](https://tools.ietf.org/html/rfc4868) [6])
	707	+ * SHA3 (based on [NIST FIPS 202](https://doi.org/10.6028/NIST.FIPS.202) [17])
650	708	* Public-key cryptography
651		- * NIST Secp384r1 Deterministic Digital Signature Algorithm (based on FIPS-186-4 [11] and RFC 6979 [7])
	709	+ * NIST Secp384r1 Deterministic Digital Signature Algorithm (based on FIPS-186-4 [11] and RFC 6979 [7])
652	710	* Key vault
653		- * Key slots
654		- * Key slot management
	711	+ * Key slots
	712	+ * Key slot management
655	713
656	714	The high-level architecture of Caliptra cryptographic subsystem is shown in the following figure.
657	715
658		-Figure 17: Caliptra cryptographic subsystem
	716	+Figure: Caliptra cryptographic subsystem
659	717
660	718	![](../images/caliptra-rtl/docs/images/Crypto-2p0.png)
661	719
@@ -680,7 +738,7 @@
680	738
681	739	The total size should be equal to 128 bits short of a multiple of 1024 since the goal is to have the formatted message size as a multiple of 1024 bits (N x 1024). The following figure shows the SHA512 input formatting.
682	740
683		-Figure 18: SHA512 input formatting
	741	+Figure: SHA512 input formatting
684	742
685	743	![](../images/caliptra-rtl/docs/images/SHA512_input.png)
686	744
@@ -692,7 +750,7 @@
692	750
693	751	The SHA512 architecture has the finite-state machine as shown in the following figure.
694	752
695		-Figure 19: SHA512 FSM
	753	+Figure: SHA512 FSM
696	754
697	755	![](../images/caliptra-rtl/docs/images/SHA512_fsm.png)
698	756
@@ -722,7 +780,7 @@
722	780
723	781	The following pseudocode demonstrates how the SHA512 interface can be implemented.
724	782
725		-Figure 20: SHA512 pseudocode
	783	+Figure: SHA512 pseudocode
726	784
727	785	![](../images/caliptra-rtl/docs/images/SHA512_pseudo.png)
728	786
@@ -803,7 +861,7 @@
803	861
804	862	The following figure shows SHA256 input formatting.
805	863
806		-Figure 21: SHA256 input formatting
	864	+Figure: SHA256 input formatting
807	865
808	866	![](../images/caliptra-rtl/docs/images/SHA256_input.png)
809	867
@@ -815,7 +873,7 @@
815	873
816	874	The SHA256 architecture has the finite-state machine as shown in the following figure.
817	875
818		-Figure 22: SHA256 FSM
	876	+Figure: SHA256 FSM
819	877
820	878	![](../images/caliptra-rtl/docs/images/SHA256_fsm.png)
821	879
@@ -850,7 +908,7 @@
850	908
851	909	The following pseudocode demonstrates how the SHA256 interface can be implemented.
852	910
853		-Figure 23: SHA256 pseudocode
	911	+Figure: SHA256 pseudocode
854	912
855	913	![](../images/caliptra-rtl/docs/images/SHA256_pseudo.png)
856	914
@@ -890,6 +948,164 @@
890	948	\| 1 KiB message \| 8761 \| 21.90 \| 45,657 \|
891	949
892	950
	951	+## SHA3
	952	+
	953	+The SHA3 HWIP performs the hash functions, whose purpose is to check the integrity of the received message.
	954	+It supports various SHA3 hashing functions including SHA3 Extended Output Function (XOF) known as SHAKE functions.
	955	+The details of the operation are described in the [SHA3 specification, FIPS 202](https://csrc.nist.gov/publications/detail/fips/202/final) known as _sponge construction_.
	956	+It has been adapted from OpenTitan and you can find documentation describing the functionality of the KMAC block it was derived from [here](https://opentitan.org/book/hw/ip/kmac/index.html).
	957	+In the current use cases of the SHA3 HW IP, either (a) messages are not considered secret (External Mu), or (b) SCA hardening would not be meaningful (HPKE in OCP L.O.C.K.), hence there are no SCA requirements.
	958	+
	959	+### Features
	960	+- Support for SHA3-224, 256, 384, 512, SHAKE[128, 256] and cSHAKE[128, 256]
	961	+- Support byte-granularity on input message
	962	+- Support arbitrary output length for SHAKE, cSHAKE
	963	+- Support customization input string S, and function-name N up to 36 bytes total
	964	+- 64b x 10 depth Message FIFO
	965	+- Performance (at 100 MHz):
	966	+ - SHA3-224: 2.93 B/cycle, 2.34 Gbit/s - 1.19 B/cycle, 952 Mbit/s (DOM)
	967	+ - SHA3-512: 1.47 B/cycle, 1.18 Gbit/s - 0.59 B/cycle, 472 Mbit/s (DOM)
	968	+
	969	+### Design Details
	970	+
	971	+#### Keccak Round
	972	+
	973	+A Keccak round implements the Keccak_f function described in the SHA3 specification.
	974	+Keccak round logic in SHA3 HWIP not only supports 1600 bit internal states but also all possible values {25, 50, 100, 200, 400, 800, 1600} based on a parameter `Width`.
	975	+Keccak permutations in the specification allow arbitrary number of rounds.
	976	+This module, however, supports Keccak_f which always runs `12 + 2*L` rounds, where \\[ L = log_2 {( {Width \over 25} )} \\] .
	977	+For instance, 200 bits of internal state run 18 rounds.
	978	+SHA3 instantiates the Keccak round module with 1600 bit.
	979	+
	980	+![](../images/caliptra-rtl/docs/images/sha3-keccak-round.svg)
	981	+
	982	+Keccak round logic has two phases inside.
	983	+Theta, Rho, Pi functions are executed at the 1st phase.
	984	+Chi and Iota functions run at the 2nd phase.
	985	+The first phase and the second phase run in the same cycle.
	986	+
	987	+To save circuit area, the Chi function uses 800 instead 1600 DOM multipliers but the multipliers are fully pipelined.
	988	+The Chi and Iota functions are thus separately applied to the two halves of the state and the 2nd phase takes in total three clock cycles to complete.
	989	+In the first clock cycle of the 2nd phase, the first stage of Chi is computed for the first lane halves of the state.
	990	+In the second clock cycle, the new first lane halves are output and written to state register.
	991	+At the same time, the first stage of Chi is computed for the second lane halves.
	992	+In the third clock cycle, the new second lane halves are output and written to the state register.
	993	+
	994	+#### Padding for Keccak
	995	+
	996	+Padding logic supports SHA3/SHAKE/cSHAKE algorithms.
	997	+cSHAKE needs the extra inputs for the Function-name `N` and the Customization string `S`.
	998	+Other than that, SHA3, SHAKE, and cSHAKE share similar datapath inside the padding module except the last part added next to the end of the message.
	999	+SHA3 adds `2'b 10`, SHAKE adds `4'b 1111`, cSHAKE adds `2'b00` then `pad10*1()` follows.
	1000	+All are little-endian values.
	1001	+
	1002	+Interface between this padding logic and the MSG_FIFO follows the conventional FIFO interface.
	1003	+So `caliptra_prim_fifo_*` can talk to the padding logic directly.
	1004	+This module talks to Keccak round logic with a more memory-like interface.
	1005	+The interface has an additional address signal on top of the valid, ready, and data signals.
	1006	+
	1007	+![](../images/caliptra-rtl/docs/images/sha3-padding.svg)
	1008	+
	1009	+The hashing process begins when the software issues the start command to `CMD` .
	1010	+If cSHAKE is enabled, the padding logic expands the prefix value (`N \|\| S` above) into a block size.
	1011	+The block size is determined by the `CFG_SHADOWED.kstrength`.
	1012	+If the value is 128, the block size will be 168 bytes.
	1013	+If it is 256, the block size will be 136 bytes.
	1014	+The expanded prefix value is transmitted to the Keccak round logic.
	1015	+After sending the block size, the padding logic triggers the Keccak round logic to run a full 24 rounds.
	1016	+
	1017	+If the mode is not cSHAKE, or cSHAKE mode and the prefix block has been processed, the padding logic accepts the incoming message bitstream and forward the data to the Keccak round logic in a block granularity.
	1018	+The padding logic controls the data flow and makes the Keccak logic to run after sending a block size.
	1019	+
	1020	+After the software writes the message bitstream, it should issue the Process command into `CMD` register.
	1021	+The padding logic, after receiving the Process command, appends proper ending bits with respect to the `CFG_SHADOWED.mode` value.
	1022	+The logic writes 0 up to the block size to the Keccak round logic then ends with 1 at the end of the block.
	1023	+
	1024	+![](../images/caliptra-rtl/docs/images/sha3-padding-fsm.svg)
	1025	+
	1026	+After the Keccak round completes the last block, the padding logic asserts an `absorbed` signal to notify the software.
	1027	+At this point, the software is able to read the digest in `STATE` memory region.
	1028	+If the output length is greater than the Keccak block rate in SHAKE and cSHAKE mode, the software may run the Keccak round manually by issuing Run command to `CMD` register.
	1029	+
	1030	+The software completes the operation by issuing Done command after reading the digest.
	1031	+The padding logic clears internal variables and goes back to Idle state.
	1032	+
	1033	+#### Message FIFO
	1034	+
	1035	+The SHA3 HWIP has a compile-time configurable depth message FIFO inside.
	1036	+The message FIFO receives incoming message bitstream regardless of its byte position in a word.
	1037	+Then it packs the partial message bytes into the internal 64 bit data width.
	1038	+After packing the data, the logic stores the data into the FIFO until the internal SHA3 engine consumes the data.
	1039	+
	1040	+##### FIFO Depth calculation
	1041	+
	1042	+The depth of the message FIFO is chosen to cover the throughput of the software or other producers such as DMA engine.
	1043	+The size of the message FIFO is enough to hold the incoming data while the SHA3 engine is processing the previous block.
	1044	+Default design parameters assume the system characteristics as below:
	1045	+
	1046	+- `kmac_pkg::RegLatency`: The register write takes 5 cycles.
	1047	+- `kmac_pkg::Sha3Latency`: Keccak round latency takes 24 cycles.
	1048	+
	1049	+##### Empty and Full status
	1050	+
	1051	+Under normal operating conditions, the SHA3 engine will process data a lot faster than software can push it to the Message FIFO.
	1052	+The Message FIFO depth observable from `STATUS.fifo_depth` will remain 0 while the `STATUS.fifo_empty` status bit is lowered for one clock cycle whenever software provides new data.
	1053	+
	1054	+After the SHA3 engine starts popping the data again, the Message FIFO will eventually run empty again and the `fifo_empty` status interrupt will fire.
	1055	+Note that the `fifo_empty` status interrupt will not fire if i) one of the hardware application interfaces is using the SHA3 block, ii) the SHA3 core is not in the `Absorb` state, or iii) after software has written the `Process` command.
	1056	+
	1057	+If software pushes data to the Message FIFO while it is full, the write operation is blocked until there is again space in the FIFO.
	1058	+This means the processor is effectively stalled.
	1059	+If the SHA3 engine is currently running and software fills up the Message FIFO, the resulting stall won't take more than 100 clock cycles.
	1060	+The stall mechanism prevents data loss and the upper bound on the wait time avoids software needing to poll the `STATUS.fifo_depth` field before writing data.
	1061	+
	1062	+### Programmer's guide
	1063	+
	1064	+The software can update the SHA3 configurations only when the IP is in the idle state.
	1065	+The software should check `STATUS.sha3_idle` before updating the configurations.
	1066	+The software must first program `CFG_SHADOWED.msg_endianness` and `CFG_SHADOWED.state_endianness` at the initialization stage.
	1067	+These determine the byte order of incoming messages (msg_endianness) and the Keccak state output (state_endianness).
	1068	+
	1069	+#### Software Initiated SHA3 process
	1070	+
	1071	+This section describes the expected software process to run the SHA3 HWIP.
	1072	+At first, the software configures `CFG_SHADOWED.kmac_en` for the operation.
	1073	+If SHA3 is enabled, the software should configure `CFG_SHADOWED.mode` to cSHAKE and `CFG_SHADOWED.kstrength` to 128 or 256 bit security strength.
	1074	+The software also updates `PREFIX` registers if cSHAKE mode is used.
	1075	+Current design does not convert cSHAKE mode to SHAKE even if `PREFIX` is empty string.
	1076	+It is the software's responsibility to change the `CFG_SHADOWED.mode` to SHAKE in case of empty `PREFIX`.
	1077	+The SHA3 HWIP uses `PREFIX` registers as it is.
	1078	+It means that the software should update `PREFIX` with encoded values.
	1079	+
	1080	+After configuring, the software notifies the SHA3 engine to accept incoming messages by issuing Start command into `CMD`.
	1081	+If Start command is not issued, the incoming message is discarded.
	1082	+
	1083	+After the software pushes all messages, it issues Process command to `CMD` for SHA3 engine to complete the sponge absorbing process.
	1084	+SHA3 hashing engine pads the incoming message as defined in the SHA3 specification.
	1085	+
	1086	+After the SHA3 engine completes the sponge absorbing step, it generates `kmac_done` interrupt.
	1087	+Or the software can poll the `STATUS.squeeze` bit until it becomes 1.
	1088	+In this stage, the software may run the Keccak round manually.
	1089	+
	1090	+If the desired digest length is greater than the Keccak rate, the software issues Run command for the Keccak round logic to run one full round after the software reads the current available Keccak state.
	1091	+At this stage, SHA3 does not raise an interrupt when the Keccak round completes the software initiated manual run.
	1092	+The software should check `STATUS.squeeze` register field for the readiness of `STATE` value.
	1093	+
	1094	+After the software reads all the digest values, it issues Done command to `CMD` register to clear the internal states.
	1095	+Done command clears the Keccak state, FSM in SHA3, and a few internal variables.
	1096	+
	1097	+#### Endianness
	1098	+
	1099	+This SHA3 HWIP operates in little-endian.
	1100	+Internal SHA3 hashing engine receives in 64-bit granularity.
	1101	+The data written to SHA3 is assumed to be little endian.
	1102	+
	1103	+The software may write/read the data in big-endian order if `CFG_SHADOWED.msg_endianness` or `CFG_SHADOWED.state_endianness` is set.
	1104	+If the endianness bit is 1, the data is assumed to be big-endian.
	1105	+So, the internal logic byte-swap the data.
	1106	+For example, when the software writes `0xDEADBEEF` with endianness as 1, the logic converts it to `0xEFBEADDE` then writes into MSG_FIFO.
	1107	+
	1108	+
893	1109	## HMAC512/HMAC384
894	1110
895	1111	Hash-based message authentication code (HMAC) is a cryptographic authentication technique that uses a hash function and a secret key. HMAC involves a cryptographic hash function and a secret cryptographic key. This implementation supports the HMAC512 variants HMAC-SHA-512-256 and HMAC-SHA-384-192 as specified in [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5]. The implementation is compatible with the HMAC-SHA-512-256 and HMAC-SHA-384-192 authentication and integrity functions defined in [RFC 4868](https://tools.ietf.org/html/rfc4868) [6].
@@ -916,25 +1132,25 @@
916	1132
917	1133	The total size should be equal to 128 bits, short of a multiple of 1024 because the goal is to have the formatted message size as a multiple of 1024 bits (N x 1024).
918	1134
919		-Figure 24: HMAC input formatting
	1135	+Figure: HMAC input formatting
920	1136
921	1137	![](../images/caliptra-rtl/docs/images/HMAC_input.png)
922	1138
923	1139	The following figures show examples of input formatting for different message lengths.
924	1140
925		-Figure 25: Message length of 1023 bits
	1141	+Figure: Message length of 1023 bits
926	1142
927	1143	![](../images/caliptra-rtl/docs/images/msg_1023.png)
928	1144
929	1145	When the message is 1023 bits long, padding is given in the next block along with message size.
930	1146
931		-Figure 26: 1 bit padding
	1147	+Figure: 1 bit padding
932	1148
933	1149	![](../images/caliptra-rtl/docs/images/1_bit.png)
934	1150
935	1151	When the message size is 895 bits, a padding of ‘1’ is also considered valid, followed by the message size.
936	1152
937		-Figure 27: Multi block message
	1153	+Figure: Multi block message
938	1154
939	1155	![](../images/caliptra-rtl/docs/images/msg_multi_block.png)
940	1156
@@ -945,13 +1161,13 @@
945	1161
946	1162	The HMAC512 core performs the sha2-512 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure.
947	1163
948		-Figure 28: HMAC-SHA-512-256 data flow
	1164	+Figure: HMAC-SHA-512-256 data flow
949	1165
950	1166	![](../images/caliptra-rtl/docs/images/HMAC_SHA_512_256.png)
951	1167
952	1168	The HMAC384 core performs the sha2-384 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure.
953	1169
954		-Figure 29: HMAC-SHA-384-192 data flow
	1170	+Figure: HMAC-SHA-384-192 data flow
955	1171
956	1172	![](../images/caliptra-rtl/docs/images/HMAC_SHA_384_192.png)
957	1173
@@ -959,7 +1175,7 @@
959	1175
960	1176	The HMAC architecture has the finite-state machine as shown in the following figure.
961	1177
962		-Figure 30: HMAC FSM
	1178	+Figure: HMAC FSM
963	1179
964	1180	![](../images/caliptra-rtl/docs/images/HMAC_FSM.png)
965	1181
@@ -997,7 +1213,7 @@
997	1213
998	1214	The following pseudocode demonstrates how the HMAC interface can be implemented.
999	1215
1000		-Figure 31: HMAC pseudocode
	1216	+Figure: HMAC pseudocode
1001	1217
1002	1218	![](../images/caliptra-rtl/docs/images/HMAC_pseudo.png)
1003	1219
@@ -1122,7 +1338,7 @@
1122	1338
1123	1339	Secp384r1 parameters are shown in the following figure.
1124	1340
1125		-Figure 32: Secp384r1 parameters
	1341	+Figure: Secp384r1 parameters
1126	1342
1127	1343	![](../images/caliptra-rtl/docs/images/secp384r1_params.png)
1128	1344
@@ -1130,7 +1346,7 @@
1130	1346
1131	1347	The ECDSA consists of three operations, shown in the following figure.
1132	1348
1133		-Figure 33: ECDSA operations
	1349	+Figure: ECDSA operations
1134	1350
1135	1351	![](../images/caliptra-rtl/docs/images/ECDSA_ops.png)
1136	1352
@@ -1175,7 +1391,7 @@
1175	1391
1176	1392	The ECC top-level architecture is shown in the following figure.
1177	1393
1178		-Figure 34: ECC architecture
	1394	+Figure: ECC architecture
1179	1395
1180	1396	![](../images/caliptra-rtl/docs/images/ECC_arch.png)
1181	1397
@@ -1215,25 +1431,25 @@
1215	1431
1216	1432	#### KeyGen
1217	1433
1218		-Figure 35: KeyGen pseudocode
	1434	+Figure: KeyGen pseudocode
1219	1435
1220	1436	![](../images/caliptra-rtl/docs/images/keygen_pseudo.png)
1221	1437
1222	1438	#### Signing
1223	1439
1224		-Figure 36: Signing pseudocode
	1440	+Figure: Signing pseudocode
1225	1441
1226	1442	![](../images/caliptra-rtl/docs/images/signing_pseudo.png)
1227	1443
1228	1444	#### Verifying
1229	1445
1230		-Figure 37: Verifying pseudocode
	1446	+Figure: Verifying pseudocode
1231	1447
1232	1448	![](../images/caliptra-rtl/docs/images/verify_pseudo.png)
1233	1449
1234	1450	#### ECDH sharedkey
1235	1451
1236		-Figure 38: ECDH sharedkey pseudocode
	1452	+Figure: ECDH sharedkey pseudocode
1237	1453
1238	1454	![](../images/caliptra-rtl/docs/images/sharedkey_pseudo.png)
1239	1455
@@ -1299,7 +1515,7 @@
1299	1515	2. KEYGEN PRIVKEY: Running HMAC\_DRBG with seed and nonce to generate the privkey in KEYGEN operation.
1300	1516	3. SIGNING NONCE: Running HMAC\_DRBG based on RFC6979 in SIGNING operation with privkey and hashed\_msg.
1301	1517
1302		-Figure 39: HMAC\_DRBG utilization
	1518	+Figure: HMAC\_DRBG utilization
1303	1519
1304	1520	![](../images/caliptra-rtl/docs/images/HMAC_DRBG_util.png)
1305	1521
@@ -1315,7 +1531,7 @@
1315	1531
1316	1532	The data flow of the HMAC\_DRBG operation in keygen operation mode is shown in the following figure.
1317	1533
1318		-Figure 40: HMAC\_DRBG data flow
	1534	+Figure: HMAC\_DRBG data flow
1319	1535
1320	1536	![](../images/caliptra-rtl/docs/images/HMAC_DRBG_data.png)
1321	1537
@@ -1325,7 +1541,7 @@
1325	1541
1326	1542	In practice, observing a t-value greater than a specific threshold (mainly 4.5) indicates the presence of leakage. However, in ECC, due to its latency, around 5 million samples are required to be captured. This latency leads to many false positives and the TVLA threshold can be considered a higher value than 4.5. Based on the following figure from “Side-Channel Analysis and Countermeasure Design for Implementation of Curve448 on Cortex-M4” by Bisheh-Niasar et. al., the threshold can be considered equal to 7 in our case.
1327	1543
1328		-Figure 41: TVLA threshold as a function of the number of samples per trace
	1544	+Figure: TVLA threshold as a function of the number of samples per trace
1329	1545
1330	1546	![](../images/caliptra-rtl/docs/images/TVLA_threshold.png)
1331	1547
@@ -1335,7 +1551,7 @@
1335	1551	The TVLA results for performing seed/nonce-dependent leakage detection using 200,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC keygen by changing the seed/nonce after 200,000 operations.
1336	1552
1337	1553
1338		-Figure 42: seed/nonce-dependent leakage detection using TVLA for ECC keygen after 200,000 traces
	1554	+Figure: seed/nonce-dependent leakage detection using TVLA for ECC keygen after 200,000 traces
1339	1555
1340	1556	![](../images/caliptra-rtl/docs/images/tvla_keygen.png)
1341	1557
@@ -1343,13 +1559,13 @@
1343	1559
1344	1560	The TVLA results for performing privkey-dependent leakage detection using 20,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC signing by changing the privkey after 20,000 operations.
1345	1561
1346		-Figure 43: privkey-dependent leakage detection using TVLA for ECC signing after 20,000 traces
	1562	+Figure: privkey-dependent leakage detection using TVLA for ECC signing after 20,000 traces
1347	1563
1348	1564	![](../images/caliptra-rtl/docs/images/TVLA_privekey.png)
1349	1565
1350	1566	The TVLA results for performing message-dependent leakage detection using 64,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC signing by changing the message after 64,000 operations.
1351	1567
1352		-Figure 44: Message-dependent leakage detection using TVLA for ECC signing after 64,000 traces
	1568	+Figure: Message-dependent leakage detection using TVLA for ECC signing after 64,000 traces
1353	1569
1354	1570	![](../images/caliptra-rtl/docs/images/TVLA_msg_dependent.png)
1355	1571
@@ -1388,15 +1604,15 @@
1388	1604
1389	1605	LMS cryptography is a type of hash-based digital signature scheme that was standardized by NIST in 2020. It is based on the Leighton-Micali Signature (LMS) system, which uses a Merkle tree structure to combine many one-time signature (OTS) keys into a single public key. LMS cryptography is resistant to quantum attacks and can achieve a high level of security without relying on large integer mathematics.
1390	1606
1391		-Caliptra supports only LMS verification using a software/hardware co-design approach. Hence, the LMS accelerator reuses the SHA256 engine to speedup the Winternitz chain by removing software-hardware interface overhead. The LMS-OTS verification algorithm is shown in follwoing figure:
1392		-
1393		-Figure 45: LMS-OTS Verification algorithm
	1607	+Caliptra supports only LMS verification using a software/hardware co-design approach. Hence, the LMS accelerator reuses the SHA256 engine to speedup the Winternitz chain by removing software-hardware interface overhead. The LMS-OTS verification algorithm is shown in following figure:
	1608	+
	1609	+Figure: LMS-OTS Verification algorithm
1394	1610
1395	1611	![](../images/caliptra-rtl/docs/images/LMS_verifying_alg.png)
1396	1612
1397	1613	The high-level architecture of LMS is shown in the following figure.
1398	1614
1399		-Figure 46: LMS high-level architecture
	1615	+Figure: LMS high-level architecture
1400	1616
1401	1617	![](../images/caliptra-rtl/docs/images/LMS_high_level.png)
1402	1618
@@ -1421,7 +1637,7 @@
1421	1637
1422	1638	The Winternitz hash chain can be accelerated in hardware to enhance the performance of the design. For that, a configurable architecture is proposed that can reuse SHA256 engine. The LMS accelerator architecture is shown in the following figure, while H is SHA256 engine.
1423	1639
1424		-Figure 47: Winternitz chain architecture
	1640	+Figure: Winternitz chain architecture
1425	1641
1426	1642	![](../images/caliptra-rtl/docs/images/LMS_wntz_arch.png)
1427	1643
@@ -1456,7 +1672,14 @@
1456	1672	Please refer to the [Adams-bridge specification](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md)
1457	1673
1458	1674	### Address map
1459		-Address map of ML-DSA accelerator is shown here: [ML-DSA\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mldsa_reg)
	1675	+Address map of ML-DSA accelerator is shown here: [ML-DSA\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.abr_reg)
	1676	+
	1677	+## Adams Bridge Kyber ML-KEM
	1678	+
	1679	+Please refer to the [Adams-bridge specification](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md)
	1680	+
	1681	+### Address map
	1682	+Address map of ML-KEM accelerator is shown here: [ML-KEM\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.abr_reg)
1460	1683
1461	1684	## AES
1462	1685
@@ -1469,6 +1692,12 @@
1469	1692	### Operation
1470	1693
1471	1694	For more information, see the [AES Programmer's Guide](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/programmers_guide.md).
	1695	+
	1696	+## AES Endian
	1697	+
	1698	+The AES Core uses little endian for the DATA_IN and DATA_OUT registers. Caliptra allows a user to stream the data into and out of AES in big endian when AES_CLP.CTRL0.ENDIAN_SWAP is set to 1. This is done by swizzling the write and read data when a write targets DATA_IN or a read targets DATA_OUT.
	1699	+
	1700	+By default little endian is selected.
1472	1701
1473	1702	### Signal descriptions
1474	1703
@@ -1482,7 +1711,7 @@
1482	1711	\| DATA_OUT \| output \| Output block result of encryption or decryption. Stored in four 32-bit registers. \|
1483	1712	\| CTRL_SHADOWED.MANUAL_OPERATION \| input \| Configures the AES core to operation in manual mode. \|
1484	1713	\| CTRL_SHADOWED.PRNG_RESEED_RATE \| input \| Configures the rate of reseeding the internal PRNG used for masking. \|
1485		-\| CTRL_SHADOWED.SIDELOAD \| input \| When asserted, AES core will use the key from the keyvault interface. \|
	1714	+\| CTRL_SHADOWED.SIDELOAD \| input \| When asserted, AES core will use the key from the key vault interface. \|
1486	1715	\| CTRL_SHADOWED.KEY_LEN \| input \| Configures the AES key length. Supports 128, 192, and 256-bit keys. \|
1487	1716	\| CTRL_SHADOWED.MODE \| input \| Configures the AES block cipher mode. \|
1488	1717	\| CTRL_SHADOWED.OPERATION \| input \| Configures the AES core to operate in encryption or decryption modes. \|
@@ -1797,19 +2026,19 @@
1797	2026	To underpin the results of the formal verification flow, the hardening of the GHASH module has been analyzed on the ChipWhisperer [CW310](https://rtfm.newae.com/Targets/CW310%20Bergen%20Board/) FPGA board.
1798	2027	For this analysis, power traces with the ChipWhisperer [Husky](https://rtfm.newae.com/Capture/ChipWhisperer-Husky/) scope were captured during GCM operations.
1799	2028	Afterwards a Test Vector Leakage Assessment (TVLA) with the [ot-sca toolset](https://github.com/lowRISC/ot-sca) has been performed.
1800		-The setup is illustrated in Figure 1.
	2029	+The setup is illustrated in the following Figure.
1801	2030
1802	2031	![](../images/caliptra-rtl/docs/images/cw310_cwhusky.jpeg)
1803	2032	:--:
1804		-Figure 1: Target CW310 FPGA board (left) and the CW Husky scope (right).
	2033	+Figure: Target CW310 FPGA board (left) and the CW Husky scope (right).
1805	2034
1806	2035	##### Setup
1807	2036
1808	2037	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure2.png)
1809	2038	:--:
1810		-Figure 2: Measurement setup. The main components are the target board, the scope, and the SCA framework.
1811		-
1812		-Figure 2 gives a detailed overview of the measurement setup that has been utilized to capture the power traces.
	2039	+Figure: Measurement setup. The main components are the target board, the scope, and the SCA framework.
	2040	+
	2041	+The prior Figure gives a detailed overview of the measurement setup that has been utilized to capture the power traces.
1813	2042	The SCA evaluation framework ot-sca is the central component of the measurement setup.
1814	2043	It is responsible for communicating with the penetration testing framework that runs on the target FPGA board and with the scope.
1815	2044	Initially, ot-sca configures the scope (sample rate, number of samples) and the pentest framework (which input, how many encryptions, where to trigger).
@@ -1821,9 +2050,9 @@
1821	2050
1822	2051	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure3.png)
1823	2052	:--:
1824		-Figure 3: Power trace with AES encryption rounds visible (left). Aligned traces when zooming in (right).
1825		-
1826		-Figure 3 depicts power traces captured during AES-GCM encryptions with the setup above.
	2053	+Figure: Power trace with AES encryption rounds visible (left). Aligned traces when zooming in (right).
	2054	+
	2055	+The prior Figure depicts power traces captured during AES-GCM encryptions with the setup above.
1827	2056	As shown in the figure, the traces are nicely aligned, allowing to perform a sound evaluation.
1828	2057
1829	2058	##### Methodology
@@ -1835,9 +2064,9 @@
1835	2064
1836	2065	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure4.png)
1837	2066	:--
1838		-Figure 4: TVLA plot showing leakage at around sample 1000. When increasing the number of traces (from 1000 to 10000), the leakage becomes more present. Note that the traces shown in this plot are taken from an arbitrary cryptographic hardware block and not AES.
1839		-
1840		-Figure 4 shows a TVLA plot that will be used throughout this document. The red lines mark the ± t-test border.
	2067	+Figure: TVLA plot showing leakage at around sample 1000. When increasing the number of traces (from 1000 to 10000), the leakage becomes more present. Note that the traces shown in this plot are taken from an arbitrary cryptographic hardware block and not AES.
	2068	+
	2069	+The prior Figure shows a TVLA plot that will be used throughout this document. The red lines mark the ± t-test border.
1841	2070
1842	2071	###### Dataset Generation for FvsR IV & Key
1843	2072
@@ -1878,26 +2107,26 @@
1878	2107
1879	2108	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure5.png)
1880	2109	:--:
1881		-Figure 5: AES-GCM block diagram. Red lines mark the trigger windows for each analysis step.
1882		-
1883		-As shown in Figure 5, we focus on analyzing (i) the generation of the hash subkey H, (ii) the encryption of the initial counter block S, (iii) the processing of the AAD blocks, (iv) the plaintext blocks, and (v) the tag generation. Each measurement is conducted with (a) masks off and (b) masks on to analyze the effectiveness of the masking countermeasure.
	2110	+Figure: AES-GCM block diagram. Red lines mark the trigger windows for each analysis step.
	2111	+
	2112	+As shown in the prior Figure, we focus on analyzing (i) the generation of the hash subkey H, (ii) the encryption of the initial counter block S, (iii) the processing of the AAD blocks, (iv) the plaintext blocks, and (v) the tag generation. Each measurement is conducted with (a) masks off and (b) masks on to analyze the effectiveness of the masking countermeasure.
1884	2113
1885	2114	###### i) SCA Evaluation of Generating the Hash Subkey H
1886	2115
1887	2116	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure6ab.png)
1888	2117	:--:
1889	2118
1890		-\| Figure 6a: Masking Off - 100k traces - Figure 6b: Masking On - 1M traces \|
	2119	+\| Figure: Masking Off - 100k traces - Figure: Masking On - 1M traces \|
1891	2120
1892	2121
1893	2122	###### Interpretation
1894	2123
1895		-The AES encryption is clearly visible in the form of 12 distinct peaks in the power traces shown Figures 6a and 6b.
	2124	+The AES encryption is clearly visible in the form of 12 distinct peaks in the power traces shown in the prior set of Figures.
1896	2125	The 12 peaks correspond to first the loading of the key and the all-zero block into the AES cipher core, followed by the initial round and the 10 full AES rounds (AES-128).
1897	2126	They spread over approximately 470 samples which corresponds to the 56 target clock cycles a full AES-128 encryption takes.
1898	2127
1899		-If the masking is turned off (Figure 6a), first and second-order leakage is clearly visible throughout the operation.
1900		-If the masking is on (Figure 6b), there is first-order leakage 1) at the beginning as well as 2) at the end of the operation.
	2128	+If the masking is turned off (set of graphs), first and second-order leakage is clearly visible throughout the operation.
	2129	+If the masking is on (set of graphs), there is first-order leakage 1) at the beginning as well as 2) at the end of the operation.
1901	2130
1902	2131	1. The leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
1903	2132	This produces first-order leakage as the inc32 function implementation isn’t masked.
@@ -1907,26 +2136,26 @@
1907	2136	The leakage is most likely due to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices: if flip flops of the different shares get mapped to the same logic slice, the carry-chain and other muxing logic present in the logic slice can combine the various inputs thereby causing SCA leakage despite these logic outputs not being used.
1908	2137	We’ve observed similar effects in the past and there is [research giving more insight into this and other FPGA-specific issues](https://ieeexplore.ieee.org/document/10545383).
1909	2138
1910		-To summarize, the observed first-order leakage if masking is on (Figure 6b) is not of concern for ASIC implementations.
	2139	+To summarize, the observed first-order leakage if masking is on is not of concern for ASIC implementations.
1911	2140
1912	2141	###### ii) SCA Evaluation of Encrypting the Initial Counter Block
1913	2142
1914	2143	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure7ab.png)
1915	2144	:--:
1916	2145
1917		-\| Figure 7a: Masking Off - 100k traces - Figure 7b: Masking On - 1M traces \|
	2146	+\| Figure: Masking Off - 100k traces - Figure: Masking On - 1M traces \|
1918	2147
1919	2148
1920	2149	###### Interpretation
1921	2150
1922		-Again, the AES encryption is clearly visible in the form of 12 peaks in the power traces shown Figures 7a and 7b.
	2151	+Again, the AES encryption is clearly visible in the form of 12 peaks in the power traces shown in the prior set of Figures.
1923	2152	This AES encryption corresponds to the generation of the encrypted initial counter block S.
1924	2153	The AES encryption is followed by another operation visible in the power trace: the computation of repeatedly used correction terms using the Galois-field multipliers inside GHASH.
1925	2154	This operation takes 33 target clock cycles (approximately 275 samples).
1926	2155
1927		-If the masking is turned off (Figure 7a), first and second-order leakage is clearly visible throughout both operations while being more pronounced during the GHASH operation.
	2156	+If the masking is turned off (set of graphs), first and second-order leakage is clearly visible throughout both operations while being more pronounced during the GHASH operation.
1928	2157	This is because the GHASH block is smaller and thus produces less noise.
1929		-If the masking is on (Figure 7b), there is first-order leakage 1) at the beginning as well as 2) between the two operations.
	2158	+If the masking is on (set of graphs), there is first-order leakage 1) at the beginning as well as 2) between the two operations.
1930	2159
1931	2160	1. As before, the leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
1932	2161	This produces first-order leakage as the inc32 function implementation isn’t masked.
@@ -1936,7 +2165,7 @@
1936	2165	As before, the leakage is most likely due to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices: Since the multiplexers for both shares use the same control signals, the multiplexing logic can be combined even into the same look-up tables (LUTs) thereby causing SCA leakage.
1937	2166	We’ve observed similar effects in the past and there is [research giving more insight into this and other FPGA-specific issues](https://ieeexplore.ieee.org/document/10545383).
1938	2167
1939		-To summarize, the observed first-order leakage if masking is on (FIgure 7b) is not of concern for ASIC implementations.
	2168	+To summarize, the observed first-order leakage if masking is on is not of concern for ASIC implementations.
1940	2169
1941	2170	###### iii) SCA Evaluation of Processing the AAD Blocks
1942	2171
@@ -1945,31 +2174,31 @@
1945	2174	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure8ab.png)
1946	2175	:--:
1947	2176
1948		-\| Figure 8a: Masking Off - 50k traces - Figure 8b: Masking On - 10M traces \|
	2177	+\| Figure: Masking Off - 50k traces - Figure: Masking On - 10M traces \|
1949	2178
1950	2179
1951	2180	###### Interpretation
1952	2181
1953	2182	For AAD blocks, the AES cipher core is not involved.
1954	2183	However, during the computation of the first AAD block, the GHASH block needs to compute an additional correction term which is used for the very first block only.
1955		-If the masking is turned off (Figure 8a), first- and second-order leakage is clearly visible but only for the first activity block.
	2184	+If the masking is turned off (first set of graphs), first- and second-order leakage is clearly visible but only for the first activity block.
1956	2185	The second activity block involves computing the additional correction terms which requires Share 1 of the encrypted initial counter block to be multiplied by Share 1 of the hash subkey.
1957	2186	But since the masking is off, both these values are zero for both the fixed and the random set and hence there is no SCA leakage.
1958		-If the masking is turned on (Figure 8b), no SCA leakage is observable which is desirable.
	2187	+If the masking is turned on (second set of graphs), no SCA leakage is observable which is desirable.
1959	2188
1960	2189	###### Processing AAD Block 1
1961	2190
1962	2191	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure9ab.png)
1963	2192	:--:
1964	2193
1965		-\| Figure 9a: Masking Off - 50k traces - Figure 9b: Masking On - 10M traces \|
	2194	+\| Figure: Masking Off - 50k traces - Figure: Masking On - 10M traces \|
1966	2195
1967	2196
1968	2197	###### Interpretation
1969	2198
1970	2199	For the second AAD block (and any subsequent AAD blocks) there is only one activity block corresponding to the Galois-field multiplication.
1971		-If masking is turned off (Figure 9a), there is both first- and second-order leakage observable.
1972		-If the masking is turned on (Figure 9b), no SCA leakage is observable which is desirable.
	2200	+If masking is turned off (first set of graphs), there is both first- and second-order leakage observable.
	2201	+If the masking is turned on (second set of graphs), no SCA leakage is observable which is desirable.
1973	2202
1974	2203	###### iv) SCA Evaluation of Processing the PTX Blocks
1975	2204
@@ -1978,12 +2207,12 @@
1978	2207	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure10ab.png)
1979	2208	:--:
1980	2209
1981		-\| Figure 10a: Masking Off - 50k traces - Figure 10b: Masking On - 1M traces \|
	2210	+\| Figure: Masking Off - 50k traces - Figure: Masking On - 1M traces \|
1982	2211
1983	2212
1984	2213	###### Interpretation
1985	2214
1986		-Like in [ii) SCA Evaluation of Encrypting the Initial Counter Block](#ii-sca-evaluation-of-encrypting-the-initial-counter-block) there is first-order leakage 1) at the beginning and 2) between the two operations if the masking is turned on (Figure 10b).
	2215	+Like in [ii) SCA Evaluation of Encrypting the Initial Counter Block](#ii-sca-evaluation-of-encrypting-the-initial-counter-block) there is first-order leakage 1) at the beginning and 2) between the two operations if the masking is turned on (first set of graphs).
1987	2216
1988	2217	1. As before, the leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
1989	2218	This produces first-order leakage as the inc32 function implementation isn’t masked.
@@ -1993,14 +2222,14 @@
1993	2222	But since the AAD and the plaintext have been chosen to be the same for all traces in the fixed and the random sets, the traces of the fixed set only produce all the same ciphertext and thus are expected to exhibit a static power signature for this step, whereas the ciphertext of the random set is randomized through the random key and IV.
1994	2223	However, since the ciphertext is not secret in the context of GCM, this leakage is of no concern.
1995	2224
1996		-To summarize, the observed first-order leakage if masking is on (FIgure 10b) is not of concern.
	2225	+To summarize, the observed first-order leakage if masking is on (second set of graphs) is not of concern.
1997	2226
1998	2227	###### Processing PTX Block 1
1999	2228
2000	2229	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure11ab.png)
2001	2230	:--:
2002	2231
2003		-\| Figure 11a: Masking Off - 50k traces - Figure 11b: Masking On - 1M traces \|
	2232	+\| Figure: Masking Off - 50k traces - Figure: Masking On - 1M traces \|
2004	2233
2005	2234
2006	2235	###### Interpretation
@@ -2013,7 +2242,7 @@
2013	2242	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure12ab.png)
2014	2243	:--:
2015	2244
2016		-\| Figure 12a: Masking Off - 50k traces - Figure 12b: Masking On - 1M traces \|
	2245	+\| Figure: Masking Off - 50k traces - Figure: Masking On - 1M traces \|
2017	2246
2018	2247
2019	2248	###### Interpretation
@@ -2023,12 +2252,12 @@
2023	2252	The GHASH state is unmasked (still masked with the encrypted initial counter block S) and Share 1 of S is added to write the final authentication tag to the data output registers readable by software.
2024	2253	2) In parallel to writing the final authentication tag to the data output registers, the internal state is all cleared to random values and an additional multiplication is triggered to clear the internal state of the Galois-field multipliers and the correction term registers.
2025	2254
2026		-If masking is turned off (Figure 12a), there is both first- and second-order leakage observable during the first activity block (tag generation) but not during the clearing operation.
2027		-If the masking is turned on (Figure 12b), some SCA leakage is observable between the two operations, i.e., when the final authentication tag is written to the output data registers.
	2255	+If masking is turned off (first set of graphs), there is both first- and second-order leakage observable during the first activity block (tag generation) but not during the clearing operation.
	2256	+If the masking is turned on (second set of graphs), some SCA leakage is observable between the two operations, i.e., when the final authentication tag is written to the output data registers.
2028	2257	This leakage is expected as both the fixed and the random data sets use a static AAD and plaintext.
2029	2258	This means, the tag for the fixed data set is fixed whereas the tags for the random set get randomized through the ciphertext (random due to the random key and IV).
2030	2259
2031		-To summarize, the observed first-order leakage if masking is on (FIgure 12b) is not of concern.
	2260	+To summarize, the observed first-order leakage if masking is on (second set of graphs) is not of concern.
2032	2261
2033	2262	##### Results – FvsR PTX & AAD
2034	2263
@@ -2040,16 +2269,16 @@
2040	2269	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure13ab.png)
2041	2270	:--:
2042	2271
2043		-\| Figure 13a: Masking Off - 50k traces - Figure 13b: Masking On - 1M traces \|
	2272	+\| Figure: Masking Off - 50k traces - Figure: Masking On - 1M traces \|
2044	2273
2045	2274
2046	2275	###### Interpretation
2047	2276
2048		-There is no SCA leakage visible in both cases without masking (Figure 13a) and with masking turned on (Figure 13b).
	2277	+There is no SCA leakage visible in both cases without masking (first set of graphs) and with masking turned on (second set of graphs).
2049	2278	This is expected as the hash subkey generation doesn’t involve the plaintext and the AAD but only the key and IV.
2050	2279	Both the fixed and random set use the same static key and IV.
2051	2280
2052		-This experiment was specifically done to check whether the leakage identified in Figure 6b and attributed to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices.
	2281	+This experiment was specifically done to check whether the leakage identified in [i) SCA Evaluation of Generating the Hash Subkey H](#i-SCA-Evaluation-of-Generating-the-Hash-Subkey-H) and attributed to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices.
2053	2282	As expected, the leakage peak is now gone.
2054	2283
2055	2284	###### ii) SCA Evaluation of Encrypting the Initial Counter Block
@@ -2057,16 +2286,16 @@
2057	2286	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure14ab.png)
2058	2287	:--:
2059	2288
2060		-\| Figure 14a: Masking Off - 50k traces - Figure 14b: Masking On - 1M traces \|
	2289	+\| Figure: Masking Off - 50k traces - Figure: Masking On - 1M traces \|
2061	2290
2062	2291
2063	2292	###### Interpretation
2064	2293
2065		-There is no SCA leakage visible in both cases without masking (Figure 14a) and with masking turned on (Figure 14b).
	2294	+There is no SCA leakage visible in both cases without masking (first set of graphs) and with masking turned on (second set of graphs).
2066	2295	This is expected as the encryption of the initial counter block and the subsequent computation of repeatedly used correction terms doesn’t involve the plaintext and the AAD but only the key and IV.
2067	2296	Both the fixed and random set use the same static key and IV.
2068	2297
2069		-This experiment was specifically done to check whether the leakage identified in Figure 7b and attributed to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices.
	2298	+This experiment was specifically done to check whether the leakage identified in [ii) SCA Evaluation of Encrypting the Initial Counter Block](#ii-SCA-Evaluation-of-Encrypting-the-Initial-Counter-Block) and attributed to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices.
2070	2299	As expected, the leakage peak is now gone.
2071	2300
2072	2301	###### iv) SCA Evaluation of Processing the PTX Block 0
@@ -2074,12 +2303,12 @@
2074	2303	![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure15ab.png)
2075	2304	:--:
2076	2305
2077		-\| Figure 15a: Masking Off - 100k traces - Figure 15b: Masking On - 1M traces \|
	2306	+\| Figure: Masking Off - 100k traces - Figure: Masking On - 1M traces \|
2078	2307
2079	2308
2080	2309	###### Interpretation
2081	2310
2082		-With the masking turned off (Figure 15a), there is first-order leakage 1) at the beginning of the operation and 2) throughout the entire GHASH operation.
	2311	+With the masking turned off (first set of graphs), there is first-order leakage 1) at the beginning of the operation and 2) throughout the entire GHASH operation.
2083	2312
2084	2313	1. The leakage at the beginning of the operation is due to the input data (the plaintext) being written to an internal buffer register.
2085	2314	The AES cipher is operated in counter mode, meaning it doesn’t encrypt the input data but the counter value (incremented IV).
@@ -2088,7 +2317,7 @@
2088	2317	2. The GHASH operation then processes this ciphertext.
2089	2318	The observed leakage when the masking is off is expected.
2090	2319
2091		-With the masking turned on (Figure 15b), the first-order leakage at the beginning of the operation remains visible. The reason for this is that the internal register buffering the previous input data is not masked.
	2320	+With the masking turned on (second set of graphs), the first-order leakage at the beginning of the operation remains visible. The reason for this is that the internal register buffering the previous input data is not masked.
2092	2321	This is of no concern as the leakage is not related to key or IV.
2093	2322
2094	2323	Another first-order leakage peak is visible between the AES encryption and the GHASH operation.
@@ -2292,9 +2521,9 @@
2292	2521	\| Lock wr\[0\] \| core_only_rst_b \| Setting the lock wr field prevents the entry from being written by the microcontroller. Keys are always locked. After a lock is set, it cannot be reset until cptra_rst_b is de-asserted. \|
2293	2522	\| Lock use\[1\] \| core_only_rst_b \| Setting the lock use field prevents the entry from being used in any cryptographic blocks. After the lock is set, it cannot be reset until cptra_rst_b is de-asserted. \|
2294	2523	\| Clear\[2\] \| cptra_rst_b \| If unlocked, setting the clear bit causes KV to clear the associated entry. The clear bit is reset after entry is cleared. \|
2295		-\| Copy\[3\] \| cptra_rst_b \| ENHANCEMENT: Setting the copy bit causes KV to copy the key to the entry written to Copy Dest field. \|
2296		-\| Copy Dest\[8:4\] \| cptra_rst_b \| ENHANCEMENT: Destination entry for the copy function. \|
2297		-\| Dest_valid\[16:9\] \| hard_reset_b \| KV entry can be used with the associated cryptographic block if the appropriate index is set. <br>\[0\] - HMAC KEY <br>\[1\] - HMAC BLOCK <br>\[2\] - MLDSA SEED <br>\[3\] - ECC PRIVKEY <br>\[4\] - ECC SEED <br>\[5\] - AES KEY <br>\[7:6\] - RSVD \|
	2524	+\| rsvd0\[3\] \|\|\|
	2525	+\| rsvd1\[8:4\] \|\|\|
	2526	+\| Dest_valid\[16:9\] \| hard_reset_b \| KV entry can be used with the associated cryptographic block if the appropriate index is set. <br>\[0\] - HMAC KEY <br>\[1\] - HMAC BLOCK <br>\[2\] - MLDSA SEED <br>\[3\] - ECC PRIVKEY <br>\[4\] - ECC SEED <br>\[5\] - AES KEY <br>\[6\] - MLKEM SEED <br>\[7\] - MLKEM MSG <br>\[8\] - AXI DMA DATA \|
2298	2527	\| last_dword\[20:19\] \| hard_reset_b \| Store the offset of the last valid dword, used to indicate the last cycle for read operations. \|
2299	2528
2300	2529
@@ -2310,7 +2539,11 @@
2310	2539
2311	2540	Similarly, after programming the key vault write control and initiating the cryptographic function that generates the key to be written, FW needs to query the associated key vault write status to confirm that the requested key was generated and written successfully.
2312	2541
	2542	+While the crypto engine, key vault read, or key vault write blocks are active, the read and write control registers are locked. After reading the status register and confirming that the operation was successful, the next key vault control can be programmed.
	2543	+
2313	2544	When a key is read from the key vault, the API register is locked and any result generated from the cryptographic block is not readable by firmware. The digest can only be sent to the key vault by appropriately programming the key vault write controls. After the cryptographic block completes its operation, the lock is cleared and the key is cleared from the API registers.
	2545	+
	2546	+Key vault read errors will prevent the crypto engine from accepting new commands. The engine will require zeroization in order to clear the error and resume normal operation.
2314	2547
2315	2548	If multiple iterations of the cryptographic function are required, the key vault read and write controls must be programmed for each iteration. This ensures that the lock is set and the digest is not readable.
2316	2549
@@ -2334,7 +2567,10 @@
2334	2567	\| ecc_pkey_dest_valid\[9\] \| ECC PKEY is a valid destination. \|
2335	2568	\| ecc_seed_dest_valid\[10\] \| ECC SEED is a valid destination. \|
2336	2569	\| aes_key_dest_valid\[11\] \| AES KEY is a valid destination. \|
2337		-\| rsvd\[31:12\] \| Reserved field \|
	2570	+\| mlkem_seed_dest_valid\[12\] \| MLKEM SEED is a valid destination. \|
	2571	+\| mlkem_msg_dest_valid\[13\] \| MLKEM MSG is a valid destination. \|
	2572	+\| dma_data_dest_valid\[14\] \| DMA DATA is a valid destination. \|
	2573	+\| rsvd\[31:15\] \| Reserved field \|
2338	2574
2339	2575
2340	2576	\| KV Status Reg \| Description \|
@@ -2342,6 +2578,41 @@
2342	2578	\| ready\[0\] \| Key vault control is idle and ready for a command. \|
2343	2579	\| valid\[1\] \| Requested flow is done. \|
2344	2580	\| error\[9:2\] \| SUCCESS - 0x0 - Key Vault flow was successful <br>KV_READ_FAIL - 0x1 - Key Vault Read flow failed <br>KV_WRITE_FAIL - 0x2 - Key Vault Write flow failed \|
	2581	+
	2582	+
	2583	+### Key vault endianness and byte ordering
	2584	+
	2585	+The Key Vault stores each entry as an array of 16 DWORDs (32-bit words), indexed KV\[0\] through KV\[15\]. The KV read and write clients perform byte and DWORD ordering transformations so that data written by one engine can be correctly consumed by another.
	2586	+
	2587	+The KV write client has a configurable parameter, `KV_WRITE_SWAP_DWORDS`, that controls DWORD ordering when writing result data into a KV entry. When set to 1 (default), the write client reverses DWORD order so that KV\[0\] holds the most-significant DWORD: KV\[offset\] = data\[N−1−offset\]. When set to 0, DWORDs are stored sequentially: KV\[offset\] = data\[offset\]. The KV read client always reads sequentially from KV\[0\] through KV\[15\]; each engine applies its own register-level mapping.
	2588	+
	2589	+#### Per-engine endianness conventions
	2590	+
	2591	+\| Engine \| Native endianness \| KV write SWAP\_DWORDS \| KV read register mapping \| Notes \|
	2592	+\| :----- \| :---------------- \| :-------------------- \| :----------------------- \| :---- \|
	2593	+\| HMAC-512 \| Big-endian \| 1 (default) \| Sequential: BLOCK\[d\] = KV\[d\], KEY\[d\] = KV\[d\] \| Block read supports PAD and HMAC auto-padding. \|
	2594	+\| SHA-512 \| Big-endian \| 1 (default) \| Sequential: BLOCK\[d\] = KV\[d\] \| Block read supports PAD. \|
	2595	+\| ECC (P-384) \| Big-endian \| 1 (default) \| Sequential: PRIVKEY\[d\] = KV\[d\], SEED\[d\] = KV\[d\] \| — \|
	2596	+\| AES \| Little-endian \| 0 \| Byte swap per DWORD: key\_reg\[d\]\[b\] = KV\_data\[3−b\] \| CTRL0.ENDIAN\_SWAP optionally swaps bytes in FW DATA\_IN/DATA\_OUT registers. \|
	2597	+\| ML-KEM \| Little-endian \| 0 \| DWORD-reversed: SEED\_D\[d\] = KV\[N−1−d\], SEED\_Z\[i\] = KV\[2N−1−i\] \| Shared key undergoes DWORD reversal in the ABR controller before the write client. \|
	2598	+\| ML-DSA \| Little-endian \| N/A (no KV write) \| DWORD-reversed: SEED\[d\] = KV\[N−1−d\] \| — \|
	2599	+
	2600	+
	2601	+Write path: HMAC, SHA-512, and ECC produce results with the most-significant DWORD at the highest internal index; the write client reversal (SWAP\_DWORDS=1) places the most-significant DWORD at KV\[0\]. AES stores its 128-bit (4 DWORD) output sequentially. The ML-KEM shared key is pre-reversed in the ABR controller (`mlkem_sharedkey_data[d] = shared_key[SHAREDKEY_NUM_DWORDS-1-d]`), producing the same KV layout as the big-endian engines despite using SWAP\_DWORDS=0.
	2602	+
	2603	+Read path: Big-endian engines (HMAC, SHA-512, ECC) use sequential mapping; register\[d\] receives KV\[d\]. AES applies a per-DWORD byte swap to convert from big-endian to its little-endian internal format. ML-KEM and ML-DSA reverse the DWORD index (`SEED[d]` is written when `kv_read_offset == N-1-d`), producing a full byte reversal of the original data.
	2604	+
	2605	+#### Firmware byte-ordering rules
	2606	+
	2607	+When firmware passes data between engines via software registers (without using KV), it must perform the following transformations. In this table, "big-endian" means the lowest-addressed register (index 0) holds the most-significant DWORD; "little-endian" means index 0 holds the least-significant DWORD. AES is little-endian but additionally byte-swaps each DWORD on the KV read path, so firmware must apply `BSWAP32` per DWORD when writing AES key registers directly.
	2608	+
	2609	+\| Source → Destination \| Transformation \| Example \|
	2610	+\| :--- \| :--- \| :--- \|
	2611	+\| Big-endian → big-endian \| Copy DWORDs directly \| HMAC tag → ECC seed \|
	2612	+\| Big-endian → little-endian \| Reverse DWORD order: DEST\[i\] = SRC\[N−1−i\] \| HMAC tag → ML-KEM seed \|
	2613	+\| Big-endian → AES \| Byte-swap each DWORD: AES\_KEY\[i\] = BSWAP32(src\[i\]) \| HMAC tag → AES key \|
	2614	+\| Little-endian → AES \| Reverse DWORDs and byte-swap each: AES\_KEY\[i\] = BSWAP32(src\[N−1−i\]) \| ML-KEM shared key → AES key \|
	2615	+\| Little-endian → big-endian (non-AES) \| Reverse DWORD order only: DEST\[i\] = src\[N−1−i\] \| ML-KEM shared key → HMAC block \|
2345	2616
2346	2617
2347	2618	### De-obfuscation engine
@@ -2363,12 +2634,12 @@
2363	2634
2364	2635	### Key vault de-obfuscation block operation
2365	2636
2366		-A de-obfuscation engine (DOE) is used in conjunction with AES cryptography to de-obfuscate the UDS and field entropy.
2367		-
2368		-1. The obfuscation key is driven to the AES key. The data to be decrypted (either obfuscated UDS or obfuscated field entropy) is fed into the AES data.
2369		-2. An FSM manually drives the AES engine and writes the decrypted data back to the key vault.
2370		-3. FW programs the DOE with the requested function (UDS or field entropy de-obfuscation), and the destination for the result.
2371		-4. After de-obfuscation is complete, FW can clear out the UDS and field entropy values from any flops until cptra\_pwrgood de-assertion.
	2637	+A de-obfuscation engine (DOE) is used in conjunction with AES cryptography to de-obfuscate the UDS and field entropy and HEK seed.
	2638	+
	2639	+1. The obfuscation key is wired to DOE engine. The data to be decrypted (either obfuscated UDS, obfuscated field entropy, or obfuscated HEK seed) is fed into the DOE data.
	2640	+2. An FSM manually drives the DOE engine and writes the decrypted data back to the key vault.
	2641	+3. FW programs the DOE with the requested function (UDS, field entropy, or HEK seed de-obfuscation), and the destination for the result.
	2642	+4. After de-obfuscation is complete, FW can clear out the UDS, field entropy, and HEK seed values from any flops until cptra\_pwrgood de-assertion.
2372	2643
2373	2644	The following tables describe DOE register and control fields.
2374	2645
@@ -2381,13 +2652,14 @@
2381	2652
2382	2653	\| DOE Ctrl Fields \| Reset \| Description \|
2383	2654	\| :--------------- \| :----------- \| :------------------------------------------------------------------------------------------------------------------------------------------- \|
2384		-\| COMMAND\[1:0\] \| Cptra_rst_b \| 2’b00 Idle <br>2’b01 Run UDS flow <br>2’b10 Run FE flow <br>2’b11 Clear Obf Secrets \|
2385		-\| DEST\[4:2\] \| Cptra_rst_b \| Destination register for the result of the de-obfuscation flow. Field entropy writes into DEST and DEST+1 <br>Key entry only, can’t go to PCR . \|
	2655	+\| CMD\[1:0\] \| Cptra_rst_b \| 2’b00 Idle <br>2’b01 Run UDS flow <br>2’b10 Run FE flow <br>2’b11 Clear Obf Secrets \|
	2656	+\| DEST\[6:2\] \| Cptra_rst_b \| Destination register for the result of the de-obfuscation flow. Field entropy writes into DEST and DEST+1 <br>Key entry only, can’t go to PCR . \|
	2657	+\| CMD_EXT\[8:7\] \| Cptra_rst_b \| 2’b00 Idle (or running a standard, non-extended command) <br>2’b01 Run OCP LOCK HEK seed flow <br>2’b10 RESERVED <br>2’b11 RESERVED \|
2386	2658
2387	2659
2388	2660	### Key vault de-obfuscation flow
2389	2661
2390		-1. ROM loads IV into DOE. ROM writes to the DOE control register the destination for the de-obfuscated result and sets the appropriate bit to run UDS and/or the field entropy flow.
	2662	+1. ROM loads IV into DOE. ROM writes to the DOE control register the destination for the de-obfuscated result and sets the appropriate bit to run UDS, field entropy, and/or HEK seed flow.
2391	2663	2. DOE state machine takes over and loads the Caliptra obfuscation key into the key register.
2392	2664	3. Next, either the obfuscated UDS or field entropy are loaded into the block register 4 DWORDS at a time.
2393	2665	4. Results are written to the KV entry specified in the DEST field of the DOE control register.
@@ -2403,6 +2675,117 @@
2403	2675	* 4B scratchpad registers that are lockable but cleared on cold reset (8 registers)
2404	2676	* 4B scratchpad registers that are lockable but cleared on warm reset (10 registers)
2405	2677	* 4B scratchpad registers that are cleared on warm reset (8 registers)
	2678	+
	2679	+
	2680	+## OCP LOCK Hardware Architecture
	2681	+
	2682	+### Overview
	2683	+The following hardware and ROM/FW enhancements support the OCP L.O.C.K. (a.k.a. OCP LOCK) flows defined for SSD applications. The specification is available here:
	2684	+[OCP LOCK Spec](https://chipsalliance.github.io/Caliptra/ocp-lock/specification/HEAD)
	2685	+
	2686	+---
	2687	+
	2688	+### Additional Registers, Straps, and Macros for OCP LOCK
	2689	+
	2690	+- `SS_OCP_LOCK_CTRL.LOCK_IN_PROGRESS`
	2691	+ A status/control bit used to enforce the new key Vvult (KV) rules required by OCP LOCK. Write-1-to-set, meaning that, once-enabled, OCP LOCK functionality will persist until the register is cleared by a cold reset. See the dedicated section below for details on the behaviors this register enables.
	2692	+
	2693	+- `ss_ocp_lock_en` (constant-value input strap) with a corresponding bit in `CPTRA_HW_CONFIG` register named `OCP_LOCK_MODE_en`:
	2694	+ - Enables Caliptra ROM to perform OCP LOCK operations (e.g., using DOE for HEK seed de-obfuscation, Key Release via AXI DMA).
	2695	+ - Allows the ROM to set `SS_OCP_LOCK_CTRL.LOCK_IN_PROGRESS`.
	2696	+ - `ss_ocp_lock_en` is a strap pin and must be driven with a constant value by the integrator.
	2697	+ - `CPTRA_HW_CONFIG` register samples this strap and store its value in `OCP_LOCK_MODE_en` bit
	2698	+ - This bit is only reflected in CPTRA_HW_CONFIG if CALIPTRA_MODE_SUBSYSTEM is defined
	2699	+
	2700	+- HEK seed fuse register
	2701	+ Holds the obfuscated HEK seed. ROM is responsible for performing the operation to de-obfuscate the HEK seed.
	2702	+
	2703	+- Key release address and size straps
	2704	+ Writable until `FUSE_WR_DONE`, then locked (same as fuses and other subsystem-mode straps).
	2705	+ - Address strap (`strap_ss_key_release_base_addr`): full destination address for key release; in OCP LOCK this is the destination for the MEK to be written. Firmware can derive the SFR base from this value as needed.
	2706	+ - Size strap (`strap_ss_key_release_key_size`): byte-count (dword-aligned count is required by HW) of the key to program to the destination address via the key release operation. Strap input values are forced to a dword value by hardware. If control firmware updates this value (prior to FUSE_WR_DONE being set), it must use a dword-aligned value.
	2707	+
	2708	+Refer to the [Caliptra Integration Spec](https://github.com/chipsalliance/caliptra-rtl/blob/main/docs/CaliptraIntegrationSpecification.md) for more details about macros and strap pins.
	2709	+
	2710	+---
	2711	+
	2712	+### `SS_OCP_LOCK_CTRL.LOCK_IN_PROGRESS` Register Bit
	2713	+
	2714	+When/How it is set
	2715	+- Set by Caliptra ROM after performing OCP LOCK-related derivations (HEK, MDK, etc.).
	2716	+- Can be set iff (`ss_ocp_lock_en` is set to 1 AND `CALIPTRA_MODE_SUBSYSTEM` is defined).
	2717	+ - Once set, a value of 1 persists until the register is cleared by cold reset.
	2718	+
	2719	+Enforcements/Effects
	2720	+- Reserves key vault slots 0–15 for standard use-cases.
	2721	+- Reserves key vault slots 16–23 for OCP LOCK use-cases.
	2722	+ - Key Vault slot 16 (KV16) is reserved for holding the MDK
	2723	+ - Key Vault slot 23 (KV23) is reserved for holding the MEK
	2724	+- Blocks interactions between standard slots and LOCK slots. This means that any crypto operation that uses a Key Vault input value (e.g. for Key, Block, Seed inputs) may not write the output to a Key Vault from a different region. E.g., When `SS_OCP_LOCK_CTRL.LOCK_IN_PROGRESS` is set, HMAC may not perform an operation that uses Key Vault slot 8 as BLOCK input and writes the output TAG to Key Vault slot 17.
	2725	+- Enables Key Release via AXI DMA.
	2726	+- Enables AES engine to write output to Key Vault, which must use KV23.
	2727	+
	2728	+> Note: If `SS_OCP_LOCK_CTRL.LOCK_IN_PROGRESS` is `1`, it also implies `ss_ocp_lock_en` and `CALIPTRA_MODE_SUBSYSTEM` are also `1`.
	2729	+
	2730	+---
	2731	+
	2732	+### AES Write Path
	2733	+
	2734	+- MEK is the final OCP LOCK key. It is decrypted and stored in KV23. After decryption, MEK may be transferred to its destination (as specified by the input strap) via AXI DMA.
	2735	+- OCP LOCK requires both the AES write path and a DMA path to the MEK destination.
	2736	+- Hardware enforcement: MEK is written to KV23. Hardware recognizes the MEK generation request if there is an AES-ECB decrypt operation with KV16 (MDK) as the AES-ECB key and routes the result accordingly. In this case, output of the decrypted plaintext via the AES dataout register API is blocked. Any Key Vault write operation requested for the AES output that does not meet these requirements results in a Key Vault write failure status.
	2737	+
	2738	+---
	2739	+
	2740	+### Key Vault Access Rules & Filtering (when `LOCK_IN_PROGRESS` is set)
	2741	+
	2742	+- KV23 (MEK destination): write-restricted to AES only.
	2743	+- KV22 (HEK): locked for writes until warm reset (ROM requirement).
	2744	+- KV16 (MDK): locked for writes until warm reset (ROM requirement).
	2745	+- If OCP LOCK mode is enabled:
	2746	+ - KV23 must not be used as input to other crypto operations—only as a Key Release source.
	2747	+ - AES-ECB decrypt with key = KV16 must have dest = KV23; otherwise the destination is FW.
	2748	+ Rationale: Prevents malicious FW from writing known values into other KV slots via AES.
	2749	+- Additional KV behaviors
	2750	+ - On write, hardware validates that the destination is legal for the source/read. If not valid, the Key Vault write operation returns a failing status.
	2751	+ - No parallel crypto operations permitted for cryptographic blocks with access to Key Vault. KV does not track this; Caliptra enforces this rule by evaluating each block's busy status indicator and signaling violations through the [CPTRA_HW_ERROR_FATAL](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.soc_ifc_reg.CPTRA_HW_ERROR_FATAL) register and corresponding interrupt at Caliptra top level design.
	2752	+
	2753	+---
	2754	+
	2755	+### HEK Seed De‑obfuscation
	2756	+
	2757	+- Executed by Caliptra ROM. The DOE supports a HEK deobfuscation command that may be executed only once during a boot cycle. If Caliptra ROM does not run this flow to produce the HEK seed, it should run the flow with a dummy Key Vault slot to lock against future erroneous uses.
	2758	+- Hardware-supported HEK seed Deobfuscation Path: Ratchet Fuse Register (obfuscated HEK seed) → DOE (with `OBF_KEY`) → KV slot 22 (de-obfuscated seed).
	2759	+- Caliptra ROM shall lock KV22 for writes immediately it has derived the HEK into that slot.
	2760	+
	2761	+---
	2762	+
	2763	+### Key Release
	2764	+
	2765	+Caliptra's AXI DMA supports a hardware path to write KV23 (MEK) to the SoC via the AXI manager interface. The following rules constrain this operation:
	2766	+- Allowed only when `SS_OCP_LOCK_CTRL.LOCK_IN_PROGRESS` (sticky W1SET) is set by Caliptra ROM.
	2767	+- Destination and size must match the values from the straps:
	2768	+ - `strap_ss_key_release_base_addr`
	2769	+ - `strap_ss_key_release_key_size`
	2770	+
	2771	+---
	2772	+
	2773	+### Additional Security Hardening Specific to OCP LOCK Enhancements
	2774	+
	2775	+Scan/Debug Protections
	2776	+- Flush DMA FIFOs to prevent leakage of secrets via scan chain.
	2777	+- Flush AES ↔ KV interface.
	2778	+
	2779	+AES/KV/DMA Robustness
	2780	+- AES → KV write path: The key can't be written to key vault unless key_size bytes are decrypted by AES.
	2781	+- Validate DMA `key_size`; error if `key_size > 512b`.
	2782	+- Avoid hangs when `key_size` != KV read DWORD count:
	2783	+ - On KV reads, if `key_size` is smaller than the KV entry, drop extra data (do not push to FIFO).
	2784	+- DMA KV read error: Raised on the first transfer cycle from KV to DMA; DMA transitions immediately to `DMA_ERROR` without issuing an AXI transfer.
	2785	+- KV write enable sourced from AES (during OCP LOCK) so it cannot be modified mid-transfer.
	2786	+- Enable AES ↔ KV write path only if `SS_OCP_LOCK_CTRL.LOCK_IN_PROGRESS` is set.
	2787	+
	2788	+
2406	2789
2407	2790	## Cryptographic blocks fatal and non-fatal errors
2408	2791
@@ -2485,10 +2868,11 @@
2485	2868	9. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302.
2486	2869	10. Schindler, W., Wiemers, A.: Efficient side-channel attacks on scalar blinding on elliptic curves with special structure. In: NISTWorkshop on ECC Standards (2015).
2487	2870	11. National Institute of Standards and Technology, "Digital Signature Standard (DSS)", Federal Information Processing Standards Publication (FIPS PUB) 186-4, July 2013.
2488		-12. NIST SP 800-90A, Rev 1: "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", 2012. \|
	2871	+12. NIST SP 800-90A, Rev 1: "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", 2012.
2489	2872	13. CHIPS Alliance, “RISC-V VeeR EL2 Programmer’s Reference Manual” \[Online\] Available at https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf.
2490	2873	14. “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanovi ́c, RISC-V Foundation, December 2019. Available at https://riscv.org/technical/specifications/.
2491	2874	15. “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanovi ́c, and John Hauser, RISC-V International, December 2021. Available at https://riscv.org/technical/specifications/.
2492		-16. NIST SP 800-56A, Rev 3: "Recommendation for Pair-Wise Key-Establishment Schemes Using Discrete Logarithm Cryptography", 2018, \|
	2875	+16. NIST SP 800-56A, Rev 3: "Recommendation for Pair-Wise Key-Establishment Schemes Using Discrete Logarithm Cryptography", 2018.
	2876	+17. NIST FIPS 202: "SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions", 2015. Available at: [https://csrc.nist.gov/pubs/fips/202/final](https://doi.org/10.6028/NIST.FIPS.202).
2493	2877
2494	2878	<sup>[1]</sup> _Caliptra. Spanish for “root cap” and describes the deepest part of the root_

Changes to Hardware Specification

Image Changes

v2.0: Caliptra_boot_fsm.png

v2.1: Caliptra_boot_fsm.png

v2.0: Crypto-2p0.png

v2.1: Crypto-2p0.png

v2.0: HW_mbox_boot_fsm.png

v2.1: HW_mbox_boot_fsm.png

v2.0: mbox_boot_fsm_FW_update_reset.png

v2.1: mbox_boot_fsm_FW_update_reset.png