Diff: Hardware Specification

@@ -1,12 +1,12 @@
1	1	<div style="font-size: 0.85em; color: #656d76; margin-bottom: 1em; padding: 0.5em; background: #f6f8fa; border-radius: 4px;">
2		-📄 Source: <a href="https://github.com/chipsalliance/caliptra-rtl/blob/5f85fb4bc95b753a2f7d042db7dc2644ca1e8c49/docs/CaliptraHardwareSpecification.md" target="_blank">chipsalliance/caliptra-rtl/docs/CaliptraHardwareSpecification.md</a> @ <code>5f85fb4</code>
	2	+📄 Source: <a href="https://github.com/chipsalliance/caliptra-rtl/blob/35b0bc5691b2bd0fc180403914cfabe207379089/docs/CaliptraHardwareSpecification.md" target="_blank">chipsalliance/caliptra-rtl/docs/CaliptraHardwareSpecification.md</a> @ <code>35b0bc5</code>
3	3	</div>
4	4
5	5	![OCP Logo](../images/caliptra-rtl/docs/images/OCP_logo.png)
6	6
7	7	<p style="text-align: center;">Caliptra Hardware Specification</p>
8	8
9		-<p style="text-align: center;">Version 1.1</p>
	9	+<p style="text-align: center;">Revision 2.0.3</p>
10	10
11	11	<div style="page-break-after: always"></div>
12	12
@@ -21,6 +21,23 @@
21	21	# Caliptra Core
22	22
23	23	For information on the Caliptra Core, see the [High level architecture](https://chipsalliance.github.io/Caliptra/doc/Caliptra.html#high-level-architecture) section of [Caliptra: A Datacenter System on a Chip (SoC) Root of Trust (RoT)](https://chipsalliance.github.io/Caliptra/doc/Caliptra.html).
	24	+
	25	+## Key Caliptra Core 2.0 Changes
	26	+* AXI subordinate replaces APB interface of Caliptra 1.x hardware
	27	+* SHA Accelerator functionality now available exclusively to Caliptra
	28	+ * Caliptra uC may use internally in mailbox mode or via the Caliptra AXI DMA assist engine in streaming mode
	29	+ * SHA Accelerator adds new SHA save/restore functionality
	30	+* Adams Bridge Dilithium/ML-DSA (refer to [Adams bridge spec](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md))
	31	+* Subsystem mode support (refer to [Subsystem Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/Caliptra%202.0%20Subsystem%20Specification%201.pdf) for details)
	32	+ * ECDH hardware support
	33	+ * HMAC512 hardware support
	34	+ * AXI Manager with DMA support (refer to [DMA Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSHardwareSpecification.md#caliptra-axi-manager--dma-assist))
	35	+ * Manufacturing and Debug Unlock
	36	+ * UDS programming
	37	+ * Read logic for Secret Fuses
	38	+ * Streaming Boot Support
	39	+* RISC-V core PMP support
	40	+* CSR HMAC key for manufacturing flow
24	41
25	42	## Boot FSM
26	43
@@ -57,12 +74,13 @@
57	74	\| Parameter \| Configuration \|
58	75	\| :---------------------- \| :------------ \|
59	76	\| Interface \| AHB-Lite \|
60		-\| DCCM \| 128 KiB \|
61		-\| ICCM \| 128 KiB \|
	77	+\| DCCM \| 256 KiB \|
	78	+\| ICCM \| 256 KiB \|
62	79	\| I-Cache \| Disabled \|
63	80	\| Reset Vector \| 0x00000000 \|
64	81	\| Fast Interrupt Redirect \| Enabled \|
65	82	\| External Interrupts \| 31 \|
	83	+\| PMP \| Enabled \|
66	84
67	85
68	86	### Embedded memory export
@@ -75,12 +93,12 @@
75	93
76	94	\| Subsystem \| Address size \| Start address \| End address \|
77	95	\| :------------------ \| :----------- \| :------------ \| :---------- \|
78		-\| ROM \| 48 KiB \| 0x0000_0000 \| 0x0000_BFFF \|
	96	+\| ROM \| 96 KiB \| 0x0000_0000 \| 0x0000_BFFF \|
79	97	\| Cryptographic \| 512 KiB \| 0x1000_0000 \| 0x1007_FFFF \|
80	98	\| Peripherals \| 32 KiB \| 0x2000_0000 \| 0x2000_7FFF \|
81		-\| SoC IFC \| 256 KiB \| 0x3000_0000 \| 0x3003_FFFF \|
82		-\| RISC-V Core ICCM \| 128 KiB \| 0x4000_0000 \| 0x4001_FFFF \|
83		-\| RISC-V Core DCCM \| 128 KiB \| 0x5000_0000 \| 0x5001_FFFF \|
	99	+\| SoC IFC \| 512 KiB \| 0x3000_0000 \| 0x3007_FFFF \|
	100	+\| RISC-V Core ICCM \| 256 KiB \| 0x4000_0000 \| 0x4003_FFFF \|
	101	+\| RISC-V Core DCCM \| 256 KiB \| 0x5000_0000 \| 0x5003_FFFF \|
84	102	\| RISC-V MM CSR (PIC) \| 256 MiB \| 0x6000_0000 \| 0x6FFF_FFFF \|
85	103
86	104
@@ -92,12 +110,14 @@
92	110	\| :---------------------------------- \| :-------- \| :----------- \| :------------ \| :---------- \|
93	111	\| Cryptographic Initialization Engine \| 0 \| 32 KiB \| 0x1000_0000 \| 0x1000_7FFF \|
94	112	\| ECC Secp384 \| 1 \| 32 KiB \| 0x1000_8000 \| 0x1000_FFFF \|
95		-\| HMAC384 \| 2 \| 4 KiB \| 0x1001_0000 \| 0x1001_0FFF \|
	113	+\| HMAC512 \| 2 \| 4 KiB \| 0x1001_0000 \| 0x1001_0FFF \|
96	114	\| Key Vault \| 3 \| 8 KiB \| 0x1001_8000 \| 0x1001_9FFF \|
97	115	\| PCR Vault \| 4 \| 8 KiB \| 0x1001_A000 \| 0x1001_BFFF \|
98	116	\| Data Vault \| 5 \| 8 KiB \| 0x1001_C000 \| 0x1001_DFFF \|
99	117	\| SHA512 \| 6 \| 32 KiB \| 0x1002_0000 \| 0x1002_7FFF \|
100		-\| SHA256 \| 13 \| 32 KiB \| 0x1002_8000 \| 0x1002_FFFF \|
	118	+\| SHA256 \| 10 \| 32 KiB \| 0x1002_8000 \| 0x1002_FFFF \|
	119	+\| ML-DSA \| 14 \| 64 KiB \| 0x1003_0000 \| 0x1003_FFFF \|
	120	+\| AES \| 15 \| 4 KiB \| 0x1001_1000 \| 0x1001_1FFF \|
101	121
102	122
103	123	#### Peripherals subsystem
@@ -106,10 +126,8 @@
106	126
107	127	\| IP/Peripheral \| Target \# \| Address size \| Start address \| End address \|
108	128	\| :------------ \| :-------- \| :----------- \| :------------ \| :---------- \|
109		-\| QSPI \| 7 \| 4 KiB \| 0x2000_0000 \| 0x2000_0FFF \|
110		-\| UART \| 8 \| 4 KiB \| 0x2000_1000 \| 0x2000_1FFF \|
111		-\| CSRNG \| 15 \| 4 KiB \| 0x2000_2000 \| 0x2000_2FFF \|
112		-\| ENTROPY SRC \| 16 \| 4 KiB \| 0x2000_3000 \| 0x2000_3FFF \|
	129	+\| CSRNG \| 12 \| 4 KiB \| 0x2000_2000 \| 0x2000_2FFF \|
	130	+\| ENTROPY SRC \| 13 \| 4 KiB \| 0x2000_3000 \| 0x2000_3FFF \|
113	131
114	132
115	133	#### SoC interface subsystem
@@ -118,10 +136,11 @@
118	136
119	137	\| IP/Peripheral \| Target \# \| Address size \| Start address \| End address \|
120	138	\| :------------------------- \| :-------- \| :----------- \| :------------ \| :---------- \|
121		-\| Mailbox SRAM Direct Access \| 10 \| 128 KiB \| 0x3000_0000 \| 0x3001_FFFF \|
122		-\| Mailbox CSR \| 10 \| 4 KiB \| 0x3002_0000 \| 0x3002_0FFF \|
123		-\| SHA512 Accelerator CSR \| 10 \| 4 KiB \| 0x3002_1000 \| 0x3002_1FFF \|
124		-\| Mailbox \| 10 \| 64 KiB \| 0x3003_0000 \| 0x3003_FFFF \|
	139	+\| Mailbox CSR \| 7 \| 4 KiB \| 0x3002_0000 \| 0x3002_0FFF \|
	140	+\| SHA512 Accelerator \| 7 \| 4 KiB \| 0x3002_1000 \| 0x3002_1FFF \|
	141	+\| AXI DMA \| 7 \| 4 KiB \| 0x3002_2000 \| 0x3002_2FFF \|
	142	+\| SOC IFC CSR \| 7 \| 64 KiB \| 0x3003_0000 \| 0x3003_FFFF \|
	143	+\| Mailbox SRAM Direct Access \| 7 \| 256 KiB \| 0x3004_0000 \| 0x3007_FFFF \|
125	144
126	145
127	146	#### RISC-V core local memory blocks
@@ -130,8 +149,8 @@
130	149
131	150	\| IP/Peripheral \| Target \# \| Address size \| Start address \| End address \|
132	151	\| :-------------- \| :-------- \| :----------- \| :------------ \| :---------- \|
133		-\| ICCM0 (via DMA) \| 12 \| 128 KiB \| 0x4000_0000 \| 0x4001_FFFF \|
134		-\| DCCM \| 11 \| 128 KiB \| 0x5000_0000 \| 0x5001_FFFF \|
	152	+\| ICCM0 (via DMA) \| 9 \| 256 KiB \| 0x4000_0000 \| 0x4003_FFFF \|
	153	+\| DCCM \| 8 \| 256 KiB \| 0x5000_0000 \| 0x5003_FFFF \|
135	154
136	155
137	156	### Interrupts
@@ -171,14 +190,16 @@
171	190	\| SHA512 (Notifications) \| 10 \| 7 \|
172	191	\| SHA256 (Errors) \| 11 \| 8 \|
173	192	\| SHA256 (Notifications) \| 12 \| 7 \|
174		-\| QSPI (Errors) \| 13 \| 4 \|
175		-\| QSPI (Notifications) \| 14 \| 3 \|
176		-\| UART (Errors) \| 15 \| 4 \|
177		-\| UART (Notifications) \| 16 \| 3 \|
178		-\| RESERVED \| 17 \| 4 \|
179		-\| RESERVED \| 18 \| 3 \|
	193	+\| RESERVED \| 13, 15, 17 \| 4 \|
	194	+\| RESERVED \| 14, 16, 18 \| 3 \|
180	195	\| Mailbox (Errors) \| 19 \| 8 \|
181	196	\| Mailbox (Notifications) \| 20 \| 7 \|
	197	+\| SHA512 Accelerator (Errors) \| 23 \| 8 \|
	198	+\| SHA512 Accelerator (Notifications) \| 24 \| 7 \|
	199	+\| MLDSA (Errors) \| 23 \| 8 \|
	200	+\| MLDSA (Notifications) \| 24 \| 7 \|
	201	+\| AXI DMA (Errors) \| 25 \| 8 \|
	202	+\| AXI DMA (Notifications) \| 26 \| 7 \|
182	203
183	204
184	205	## Watchdog timer
@@ -230,182 +251,18 @@
230	251
231	252	As a result of this implementation, 64-bit data transfers are not supported on the Caliptra AHB fabric. Firmware running on the internal microprocessor may only access memory and registers using a 32-bit or smaller request size, as 64-bit transfer requests will be corrupted.
232	253
	254	+All AHB requests internal to Caliptra must be to an address that is aligned to the native data width of 4-bytes. Any AHB read or write by the Caliptra RISC-V processor that is not aligned to this boundary will fail to decode to the targeted register, will fail to write the submitted data, and will return read data of all zeroes. All AHB requests must also use the native size of 4 bytes (encoded in the hsize signal with a value of 2). The only exception to this is when the RISC-V processor performs byte-aligned, single-byte reads to the Mailbox SRAM using the direct-access mechanism described in [SoC Mailbox](#SoC-mailbox). In this case, a byte-aligned address must be accompanied by the correct size indicator for a single-byte access. Read addresses for byte accesses are aligned to the 4-byte boundary in hardware, and will successfully complete with the correct data at the specified byte offset. Direct mode SRAM writes must be 4-bytes in size and must be aligned to the 4-byte boundary. Hardware writes the entire dword of data to the aligned address, so attempts to write a partial word of data may result in data corruption.
	255	+
233	256	## Cryptographic subsystem
234	257
235	258	For details, see the [Cryptographic subsystem architecture](#cryptographic-subsystem-architecture) section.
236	259
237		-## Peripherals subsystem
238		-
239		-Caliptra includes QSPI and UART peripherals that are used to facilitate alternative operating modes and debug. In the first generation, Caliptra does not support enabling the QSPI interface. Similarly, the UART interface exists to facilitate firmware debug in an FPGA prototype, but should be disabled in final silicon. SystemVerilog defines used to disable these peripherals are described in the [Caliptra Integration Specification](https://github.com/chipsalliance/caliptra-rtl/blob/main/docs/CaliptraIntegrationSpecification.md). Operation of these peripherals is described in the following sections.
240		-
241		-### QSPI Flash Controller
242		-
243		-Caliptra implements a QSPI block that can communicate with 2 QSPI devices. This QSPI block is accessible to FW over the AHB-lite Interface.
244		-
245		-The QSPI block is composed of the spi\_host implementation. For information, see the [SPI\_HOST HWIP Technical Specification](https://opentitan.org/book/hw/ip/spi_host/index.html). The core code (see [spi\_host](https://github.com/lowRISC/opentitan/tree/master/hw/ip/spi_host)) is reused but the interface to the module is changed to AHB-lite and the number of chip select lines supported is increased to 2. The design provides support for Standard SPI, Dual SPI, or Quad SPI commands. The following figure shows the QSPI flash controller.
246		-
247		-Figure 4: QSPI flash controller
248		-
249		-![](../images/caliptra-rtl/docs/images/QSPI_flash.png)
250		-
251		-#### Operation
252		-
253		-Transactions flow through the QSPI block starting with AHB-lite writes to the TXDATA FIFO. Commands are then written and processed by the control FSM, orchestrating transmissions from the TXDATA FIFO and receiving data into the RXDATA FIFO.
254		-
255		-The structure of a command depends on the device and the command itself. In the case of a standard SPI device, the host IP always transmits data on qspi\_d\_io[0] and always receives data from the target device on qspi\_d\_io[1]. In Dual or Quad modes, all data lines are bi-directional, thus allowing full bandwidth in transferring data across 4 data lines.
256		-
257		-A typical SPI command consists of different segments that are combined as shown in the following example. Each segment can configure the length, speed, and direction. As an example, the following SPI read transaction consists of 2 segments.
258		-
259		-Figure 5: SPI read transaction segments
260		-
261		-![](../images/caliptra-rtl/docs/images/SPI_read.png)
262		-
263		-\| Segment \# \| Length (Bytes) \| Speed \| Direction \| TXDATA FIFO \| RXDATA FIFO \|
264		-\| :--------- \| :------------- \| :------- \| :---------------- \| :----------- \| :----------------- \|
265		-\| 1 \| 4 \| standard \| TX <br>qspi_d_io\[0\] \| \[0\] 0x3 (ReadData) <br>\[1\] Addr\[23:16\] <br>\[2\] Addr\[15:8\] <br>\[3\] Addr\[7:0\] \|\|
266		-\| 2 \| 1 \| standard \| RX <br>qspi_d_io\[1\] \|\| \[0\] Data \[7:0\] \|
267		-
268		-
269		-In this example, the ReadData (0x3) command was written to the TXDATA FIFO, followed by the 3B address. This maps to a total of 4 bytes that are transmitted out across qspi\_d\_io[0] in the first segment. The second segment consists of a read command that receives 1 byte of data from the target device across qspi\_d\_io[1].
270		-
271		-QSPI consists of up to four command segments in which the host:
272		-
273		-1. Transmits instructions or data at the standard rate
274		-2. Transmits instructions address or data on 2 or 4 data lines
275		-3. Holds the bus in a high-impedance state for some number of dummy cycles where neither side transmits
276		-4. Receives information from the target device at the specified rate (derived from the original command)
277		-
278		-The following example shows the QSPI segments.
279		-
280		-Figure 6: QSPI segments
281		-
282		-![](../images/caliptra-rtl/docs/images/QSPI_segments.png)
283		-
284		-\| Segment \# \| Length (Bytes) \| Speed \| Direction \| TXDATA FIFO \| RXDATA FIFO \|
285		-\| :--------- \| :------------- \| :------- \| :------------------ \| :----------- \| :---------------- \|
286		-\| 1 \| 1 \| standard \| TX <br>qspi_d_io\[3:0\] \| \[0\] 0x6B (ReadDataQuad) \|\|
287		-\| 2 \| 3\* \| quad \| TX <br>qspi_d_io\[3:0\] \| \[1\] Addr\[23:16\] <br>\[2\] Addr\[15:8\] <br>\[3\] Addr\[7:0\] \|\|
288		-\| 3 \| 2 \| N/A \| None (Dummy) \|\|\|
289		-\| 4 \| 1 \| quad \| RX <br>qspi_d_io\[3:0\] \|\| \[0\] Data\[7:0\] \|
290		-
291		-
292		-Note: In the preceding figure, segment 2 doesn’t show bytes 2 and 3 for brevity.
293		-
294		-#### Configuration
295		-
296		-The CONFIGOPTS multi-register has one entry per CSB line and holds clock configuration and timing settings that are specific to each peripheral. After the CONFIGOPTS multi-register is programmed for each SPI peripheral device, the values can be left unchanged.
297		-
298		-The most common differences between target devices are the requirements for a specific SPI clock phase or polarity, CPOL and CPHA. These clock parameters can be set via the CONFIGOPTS.CPOL or CONFIGOPTS.CPHA register fields.
299		-
300		-The SPI clock rate depends on the peripheral clock and a 16b clock divider configured by CONFIGOPTS.CLKDIV. The following equation is used to configure the SPI clock period:
301		-
302		-![](../images/caliptra-rtl/docs/images/Caliptra_eq_SPI_clk_period.png)
303		-
304		-By default, CLKDIV is set to 0, which means that the maximum frequency that can be achieved is at most half the frequency of the peripheral clock (Fsck = Fclk/2).
305		-
306		-We can rearrange the equation to solve for the CLKDIV:
307		-
308		-![](../images/caliptra-rtl/docs/images/Caliptra_eq_CLKDIV.png)
309		-
310		-Assuming a 400MHz target peripheral, and a SPI clock target of 100MHz:
311		-
312		-CONFIGOPTS.CLKDIV = (400/(2\*100)) -1 = 1
313		-
314		-The following figure shows CONFIGOPTS.
315		-
316		-Figure 7: CONFIGOPTS
317		-
318		-![](../images/caliptra-rtl/docs/images/CONFIGOPTS.png)
319		-
320		-#### Signal descriptions
321		-
322		-The QSPI block architecture inputs and outputs are described in the following table.
323		-
324		-\| Name \| Input or output \| Description \|
325		-\| :------------------ \| :-------------- \| :-------------------------------------------------------- \|
326		-\| clk_i \| input \| All signal timings are related to the rising edge of clk. \|
327		-\| rst_ni \| input \| The reset signal is active LOW and resets the core. \|
328		-\| cio_sck_o \| output \| SPI clock \|
329		-\| cio_sck_en_o \| output \| SPI clock enable \|
330		-\| cio_csb_o\[1:0\] \| output \| Chip select \# (one hot, active low) \|
331		-\| cio_csb_en_o\[1:0\] \| output \| Chip select \# enable (one hot, active low) \|
332		-\| cio_csb_sd_o\[3:0\] \| output \| SPI data output \|
333		-\| cio_csb_sd_en_o \| output \| SPI data output enable \|
334		-\| cio_csb_sd_i\[3:0\] \| input \| SPI data input \|
335		-
336		-
337		-#### SPI\_HOST IP programming guide
338		-
339		-The operation of the SPI\_HOST IP proceeds in seven general steps.
340		-
341		-To initialize the IP:
342		-
343		-1. Program the CONFIGOPTS multi-register with the appropriate timing and polarity settings for each csb line.
344		-2. Set the desired interrupt parameters.
345		-3. Enable the IP.
346		-
347		-Then for each command:
348		-
349		-4. Load the data to be transmitted into the FIFO using the TXDATA memory window.
350		-5. Specify the target device by programming the CSID.
351		-6. Specify the structure of the command by writing each segment into the COMMAND register.
352		-
353		- For multi-segment transactions, assert COMMAND.CSAAT for all but the last command segment.
354		-
355		-7. For transactions that expect to receive a reply, the data can then be read back from the RXDATA window.
356		-
357		-Steps 4-7 are then repeated for each subsequent command.
358		-
359		-### UART
360		-
361		-Caliptra implements a UART block that can communicate with a serial device that is accessible to FW over the AHB-lite Interface. This is a configuration that the SoC opts-in by defining CALIPTRA\_INTERNAL\_UART.
362		-
363		-The UART block is composed of the uart implementation. For information, see the [UART HWIP Technical Specification](https://opentitan.org/book/hw/ip/uart/). The design provides support for a programmable baud rate. The UART block is shown in the following figure.
364		-
365		-Figure 8: UART block
366		-
367		-![](../images/caliptra-rtl/docs/images/UART_block.png)
368		-
369		-#### Operation
370		-
371		-Transactions flow through the UART block starting with an AHB-lite write to WDATA, which triggers the transmit module to start a UART TX serial data transfer. The TX module dequeues the byte from the internal FIFO and shifts it out bit by bit at the baud rate. If TX is not enabled, the output is set high and WDATA in the FIFO is queued up.
372		-
373		-The following figure shows the transmit data on the serial lane, starting with the START bit, which is indicated by a high to low transition, followed by the 8 bits of data.
374		-
375		-Figure 9: Serial transmission frame
376		-
377		-![](../images/caliptra-rtl/docs/images/serial_transmission.png)
378		-
379		-On the receive side, after the START bit is detected, the data is sampled at the center of each data bit and stored into a FIFO. A user can monitor the FIFO status and read the data out of RDATA.
380		-
381		-#### Configuration
382		-
383		-The baud rate can be configured using the CTRL.NCO register field. This should be set using the following equation:
384		-
385		-![](../images/caliptra-rtl/docs/images/Caliptra_eq_NCO.png)
386		-
387		-If the desired baud rate is 115,200bps:
388		-
389		-![](../images/caliptra-rtl/docs/images/Caliptra_eq_UART.png)
390		-
391		-![](../images/caliptra-rtl/docs/images/Caliptra_eq_UART2.png)
392		-
393		-#### Signal descriptions
394		-
395		-The UART block architecture inputs and outputs are described in the following table.
396		-
397		-\| Name \| Input or output \| Description \|
398		-\| :------- \| :-------------- \| :-------------------------------------------------------- \|
399		-\| clk_i \| input \| All signal timings are related to the rising edge of clk. \|
400		-\| rst_ni \| input \| The reset signal is active LOW and resets the core. \|
401		-\| cio_rx_i \| input \| Serial receive bit \|
402		-\| cio_tx_o \| output \| Serial transmit bit \|
403		-
404		-
405	260	## SoC mailbox
406	261
407	262	For more information on the mailbox protocol, see [Mailbox](https://github.com/chipsalliance/caliptra-rtl/blob/main/docs/CaliptraIntegrationSpecification.md#mailbox) in the Caliptra Integration Specification. Mailbox registers accessible to the Caliptra microcontroller are defined in [internal-regs/mbox_csr](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_csr).
408	263
	264	+The RISC-V processor is able to access the SoC mailbox SRAM using a direct access mode (which bypasses the defined mailbox protocol). The addresses for performing this access are described in [SoC interface subsystem](#SoC-interface-subsystem) and in [mbox_sram](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_sram). In this mode, firmware must first acquire the mailbox lock. Then, reads and writes to the direct access address region will go directly to the SRAM block. Firmware must release the mailbox lock by writing to the [mbox_unlock](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_csr.mbox_unlock) register after direct access operations are completed.
	265	+
409	266
410	267	## Security state
411	268
@@ -417,7 +274,7 @@
417	274
418	275	* Caliptra JTAG is opened for the microcontroller and HW debug.
419	276
420		-* Device secrets (UDS, FE, key vault, and obfuscation key) are programmed to debug values.
	277	+* Device secrets (UDS, FE, key vault, csr hmac key and obfuscation key) are programmed to debug values.
421	278
422	279	If a transition to debug mode happens during ROM operation, any values computed from the use of device secrets may not match expected values.
423	280
@@ -428,11 +285,14 @@
428	285	\| Name \| Default value \|
429	286	\| :-------------------------- \| :------------ \|
430	287	\| Obfuscation Key Debug Value \| All 0x1 \|
	288	+\| CSR HMAC Key Debug Value \| All 0x1 \|
431	289	\| UDS Debug Value \| All 0x1 \|
432	290	\| Field Entropy Debug Value \| All 0x1 \|
433	291	\| Key Vault Debug Value 0 \| All 0xA \|
434	292	\| Key Vault Debug Value 1 \| All 0x5 \|
435	293
	294	+
	295	+Note: When entering debug or scan mode, all crypto engines are zeroized. Before starting any crypto operation in these modes, the status registers of all crypto engines must be checked to confirm they are ready. Failing to do so may trigger a fatal error caused by concurrent crypto operations.
436	296
437	297	## Clock gating
438	298
@@ -472,17 +332,17 @@
472	332
473	333	* JTAG accesses
474	334
475		-* APB transactions
476		-
477		-Activity on the APB interface only wakes up the SoC IFC clock. All other clocks remain off until any other condition is met or the core exits the halt state.
478		-
479		-\| Cpu_halt_status \| PSEL \| Generic input wires <br>\|\| fatal error <br>\|\| debug/scan mode <br> \|\|JTAG access \| Expected behavior \|
	335	+* AXI transactions
	336	+
	337	+Activity on the AXI subordinate interface only wakes up the SoC IFC clock. All other clocks remain off until any other condition is met or the core exits the halt state.
	338	+
	339	+\| Cpu_halt_status \| s_axi_active \| Generic input wires <br>\|\| fatal error <br>\|\| debug/scan mode <br> \|\|JTAG access \| Expected behavior \|
480	340	\| :-------------- \| :--- \| :---------- \| :-------------- \|
481	341	\| 0 \| X \| X \| All gated clocks active \|
482	342	\| 1 \| 0 \| 0 \| All gated clocks inactive \|
483	343	\| 1 \| 0 \| 1 \| All gated clocks active (as long as condition is true) \|
484		-\| 1 \| 1 \| 0 \| Soc_ifc_clk_cg active (as long as PSEL = 1) <br>All other clks inactive \|
485		-\| 1 \| 1 \| 1 \| Soc_ifc_clk_cg active (as long as condition is true OR PSEL = 1) <br>All other clks active (as long as condition is true) \|
	344	+\| 1 \| 1 \| 0 \| Soc_ifc_clk_cg active (as long as s_axi_active = 1) <br>All other clks inactive \|
	345	+\| 1 \| 1 \| 1 \| Soc_ifc_clk_cg active (as long as condition is true OR s_axi_active = 1) <br>All other clks active (as long as condition is true) \|
486	346
487	347
488	348	### Usage
@@ -490,7 +350,7 @@
490	350	The following applies to the clock gating feature:
491	351
492	352	* The core should only be halted after all pending vault writes are done and cryptographic operations are complete.
493		-* While the core is halted, any APB transaction wakes up the SoC interface clock and leaves all other clocks disabled. If the core is still halted when the APB transactions are done, the SoC interface clock is returned to a disabled state. .
	353	+* While the core is halted, any AXI transaction wakes up the SoC interface clock and leaves all other clocks disabled. If the core is still halted when the AXI transactions are done, the SoC interface clock is returned to a disabled state. .
494	354	* The RDC clock is similar to an ungated clock and is only disabled when a reset event occurs. This avoids metastability on flops. The RDC clock operates independently of core halt status.
495	355
496	356
@@ -530,7 +390,7 @@
530	390
531	391	### Operation
532	392
533		-Requests for entropy bits start with [command requests](https://opentitan.org/book/hw/ip/csrng/doc/theory_of_operation.html#general-command-format) over the AHB-lite interface to the csrng [CMD\_REQ](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.csrng_reg.CMD_REQ) register.
	393	+Requests for entropy bits start with [command requests](https://opentitan.org/book/hw/ip/csrng/doc/theory_of_operation.html#general-command-format) over the AHB-lite interface to the csrng [CMD\_REQ](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.csrng_reg.CMD_REQ) register.
534	394
535	395	The following describes the fields of the command request header:
536	396
@@ -542,7 +402,7 @@
542	402
543	403	* Generate Length: Only defined for the generate command, this field is the total number of cryptographic entropy blocks requested. Each unit represents 128 bits of entropy returned. A value of 8 would return a total of 1024 bits. The maximum size supported is 4096.
544	404
545		-First an instantiate command is requested over the SW application interface to initialize an instance in the CSRNG module. Depending on the flag0 and clen fields in the command header, a request to the entropy\_src module over the entropy interface is sent to seed the csrng. This can take a few milliseconds if the seed entropy is not immediately available.
	405	+First an instantiate command is requested over the SW application interface to initialize an instance in the CSRNG module. Depending on the flag0 and clen fields in the command header, a request to the entropy\_src module over the entropy interface is sent to seed the csrng. This can take a few milliseconds if the seed entropy is not immediately available.
546	406
547	407	Example instantiation:
548	408
@@ -560,7 +420,7 @@
560	420	\| T \| 1-12 \| Only provided additional data is used as seed. \|
561	421
562	422
563		-Next a generate command is used to request generation of cryptographic entropy bits. The glen field defines how many 128 bit words are to be returned to the application interface. After the generated bits are ready, they can be read out via the GENBITS register. This register must be read out glen \* 4 times for each request made.
	423	+Next a generate command is used to request generation of cryptographic entropy bits. The glen field defines how many 128 bit words are to be returned to the application interface. After the generated bits are ready, they can be read out via the GENBITS register. This register must be read out glen \* 4 times for each request made.
564	424
565	425	Example generate command:
566	426
@@ -634,6 +494,111 @@
634	494
635	495	The CSRNG may only be enabled if entropy\_src is enabled. After it is disabled, CSRNG may only be re-enabled after entropy\_src has been disabled and re-enabled.
636	496
	497	+### FIPS considerations
	498	+
	499	+The following sections illustrate the self-test parameter configuration. The
	500	+`entropy_src` block provides additional tests, but Caliptra focuses primarily
	501	+on the adaptive and repetition count tests, which are the ones strictly
	502	+required for FIPS compliance. Additional details can be found in NIST
	503	+publication SP 800-90B.
	504	+
	505	+The TRNG must be re-initialized whenever self-test parameter changes are
	506	+needed. As described in the previous section, the initialization steps
	507	+are as follows:
	508	+
	509	+1. Disable `csrng` and `entropy_src` in that order.
	510	+2. Apply new self-test configuration.
	511	+3. Enable `entropy_src` and `csrng` in that order.
	512	+
	513	+### Adaptive self-test window and thresholds
	514	+
	515	+This section details the configuration of the `entropy_src`, focusing on how
	516	+the test window size for the adaptive self-test is determined and how it
	517	+relates to threshold calculations.
	518	+
	519	+#### Understanding Test Window Sizes
	520	+
	521	+The adaptive self-test within the `entropy_src` block utilizes a
	522	+configurable test window. To clarify its interpretation, two terms are
	523	+defined:
	524	+
	525	+* `ENTROPY_TEST_WINDOW`: This refers to the test window size directly
	526	+ configured in the hardware registers of the `entropy_src` block.
	527	+* `ACTUAL_TEST_WINDOW`: This refers to the effective window size used for
	528	+ the adaptive self-test threshold calculations. Its value depends on how
	529	+ the test scores are aggregated.
	530	+
	531	+The aggregation method is determined by the CONF.THRESHOLD_SCOPE setting in
	532	+the entropy_src block.
	533	+
	534	+#### Aggregate per symbol
	535	+
	536	+When CONF.THRESHOLD_SCOPE is enabled:
	537	+
	538	+* The adaptive test combines the inputs from all physical entropy lines
	539	+ into a single, cumulative score.
	540	+* The test essentially treats the combined input as a single binary stream,
	541	+ counting the occurrences of '1's.
	542	+* In this configuration:
	543	+ * If `ENTROPY_TEST_WINDOW` is set to 1024, then
	544	+ * `ACTUAL_TEST_WINDOW` = `ENTROPY_TEST_WINDOW` = 1024
	545	+
	546	+#### Handle each physical noise source separately
	547	+
	548	+When `CONF.THRESHOLD_SCOPE` is disabled:
	549	+
	550	+* The adaptive test scores each individual physical noise input line
	551	+ independently.
	552	+* This allows for monitoring the health of each noise source.
	553	+* In this configuration (assuming, for example, 4 noise sources):
	554	+ * If `ENTROPY_TEST_WINDOW` is set to 4096 bits, then
	555	+ * `ACTUAL_TEST_WINDOW` = (`ENTROPY_TEST_WINDOW` / 4) = 1024
	556	+
	557	+#### Configuring adaptive self-test thresholds
	558	+
	559	+Once the `ACTUAL_TEST_WINDOW` is determined, the adaptive self-test
	560	+thresholds can be configured as follows:
	561	+
	562	+* `ADAPTP_HI_THRESHOLDS.FIPS_THRESH` = `adaptp_cutoff`
	563	+* `ADAPTP_LO_THRESHOLDS.FIPS_THRESH` = `ACTUAL_TEST_WINDOW` - `adaptp_cutoff`
	564	+
	565	+Here, `adaptp_cutoff` represents the pre-determined cutoff value for the
	566	+adaptive proportion test, as defined by NIST SP 800-90B. See the threshold
	567	+calculations below as an example.
	568	+
	569	+\$α = 2^{-40}\$ (recommended)\
	570	+\$H = 0.5\$ (example, estimated entropy measured from hardware)\
	571	+\$W\$ = `ACTUAL_TEST_WINDOW`\
	572	+`adaptp_cutoff` = \$1 + critbinom(W, 2^{-H}, 1 - α)\$
	573	+
	574	+> Note: The `critbinom` function (critical binomial distribution function) is
	575	+> implemented by most spreadsheet applications.
	576	+
	577	+### Recommended configuration
	578	+
	579	+The following configuration is recommended for the adaptive and repetition
	580	+count tests:
	581	+
	582	+#### Adaptive test
	583	+
	584	+1. Set `CONF.THRESHOLD_SCOPE` to disabled. This allows the test to monitor
	585	+ and score each physical noise source individually, providing more granular
	586	+ health information.
	587	+2. Set `HEALTH_TEST_WINDOWS.FIPS_WINDOW` to 4096 bits. This value serves
	588	+ as the `ENTROPY_TEST_WINDOW`. With the current 4 noise source configuration,
	589	+ this is equivalent to 1024 bits per noise source, where each source produces
	590	+ 1 bit of entropy as defined in NIST SP 800-90B.
	591	+3. Calculate thresholds. Use an `ACTUAL_TEST_WINDOW` of 1024 bits (derived
	592	+ from step 2) in the adaptive test threshold formulas provided earlier in
	593	+ this subsection.
	594	+
	595	+#### Repetition count test
	596	+
	597	+The methodology used for calculating the repetition count threshold in the
	598	+ROM boot phase can be directly applied for this test as well. The threshold is
	599	+applied on a per-noise-source basis.
	600	+
	601	+
637	602	## External-TRNG REQ HW API
638	603
639	604	For SoCs that choose to not instantiate Caliptra’s integrated TRNG, Caliptra provides a TRNGREQ HW API.
@@ -647,18 +612,16 @@
647	612
648	613	## SoC-SHA accelerator HW API
649	614
650		-Caliptra provides a SHA accelerator HW API for SoC and Caliptra internal FW to use. It is atomic in nature in that only one of them can use the SHA accelerator HW API at the same time. Details of the SHA accelerator register block may be found in the GitHub repository in [documentation](https://chipsalliance.github.io/caliptra-rtl/main/external-regs/?p=caliptra_top_reg.sha512_acc_csr) generated from the register definition file.
	615	+Caliptra provides a SHA accelerator HW API for Caliptra internal FW to use via mailbox or via DMA operations through the AXI subordinate interface. The SHA accelerator HW API is restricted on AXI for use by Caliptra via the AXI DMA assist block; this access restriction is enforced by checking logic on the AXI AxUSER signal associated with the request.
651	616
652	617	Using the HW API:
653	618
654	619	* A user of the HW API first locks the accelerator by reading the LOCK register. A read that returns the value 0 indicates that the resource was locked for exclusive use by the requesting user. A write of ‘1 clears the lock.
655		-* The USER register captures the APB pauser value of the requestor that locked the SHA accelerator. This is the only user that is allowed to control the SHA accelerator by performing APB register writes. Writes by any other agent on the APB interface are dropped.
656		-* MODE register is written to set the SHA execution mode.
657		- * SHA accelerator supports both SHA384 and SHA512 modes of operation.
658		- * SHA supports streaming mode: SHA is computed on a stream of incoming data to the DATAIN register. The EXECUTE register, when set, indicates to the accelerator that streaming is complete. The accelerator can then publish the result into the DIGEST register. When the VALID bit of the STATUS register is set, then the result in the DIGEST register is valid.
659		- * SHA supports Mailbox mode: SHA is computed on LENGTH (DLEN) bytes of data stored in the mailbox beginning at START\_ADDRESS. This computation is performed when the EXECUTE register is set by the user. When the operation is completed and the result in the DIGEST register is valid, SHA accelerator sets the VALID bit of the STATUS register.
660		- * The SHA computation engine in the SHA accelerator requires big endian data, but the SHA accelerator can accommodate mailbox input data in either the little endian or big endian format. By default, input data is assumed to be little endian and is swizzled to big endian at the byte level prior to computation. For the big endian format, data is loaded into the SHA engine as-is. Users may configure the SHA accelerator to treat data as big endian by setting the ENDIAN\_TOGGLE bit appropriately.
661		- * See the register definition for the encodings.
	620	+* The USER register captures the AXI USERID value of the requestor that locked the SHA accelerator. This is the only user that is allowed to control the SHA accelerator by performing AXI register writes. Writes by any other agent on the AXI subordinate interface are dropped.
	621	+* SHA supports Mailbox mode: SHA is computed on LENGTH (DLEN) bytes of data stored in the mailbox beginning at START\_ADDRESS. This computation is performed when the EXECUTE register is set by the user. When the operation is completed and the result in the DIGEST register is valid, SHA accelerator sets the VALID bit of the STATUS register.
	622	+* Note that even though the mailbox size is fixed, due to SHA save/restore function enhancement, there is no limit on the size of the block that needs to be SHAd. SOC needs to follow FW API
	623	+* The SHA computation engine in the SHA accelerator requires big endian data, but the SHA accelerator can accommodate mailbox input data in either the little endian or big endian format. By default, input data is assumed to be little endian and is swizzled to big endian at the byte level prior to computation. For the big endian format, data is loaded into the SHA engine as-is. Users may configure the SHA accelerator to treat data as big endian by setting the ENDIAN\_TOGGLE bit appropriately.
	624	+* See the register definition for the encodings.
662	625	* SHA engine also provides a ‘zeroize’ function through its CONTROL register to clear any of the SHA internal state. This can be used when the user wants to conceal previous state for debug or security reasons.
663	626
664	627	## JTAG implementation
@@ -683,7 +646,7 @@
683	646	* De-obfuscation engine
684	647	* SHA512/384 (based on NIST FIPS 180-4 [2])
685	648	* SHA256 (based on NIST FIPS 180-4 [2])
686		- * HMAC384 (based on [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5] and [RFC 4868](https://tools.ietf.org/html/rfc4868) [6])
	649	+ * HMAC512 (based on [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5] and [RFC 4868](https://tools.ietf.org/html/rfc4868) [6])
687	650	* Public-key cryptography
688	651	* NIST Secp384r1 Deterministic Digital Signature Algorithm (based on FIPS-186-4 [11] and RFC 6979 [7])
689	652	* Key vault
@@ -694,7 +657,7 @@
694	657
695	658	Figure 17: Caliptra cryptographic subsystem
696	659
697		-![](../images/caliptra-rtl/docs/images/crypto_subsystem.png)
	660	+![](../images/caliptra-rtl/docs/images/Crypto-2p0.png)
698	661
699	662	## SHA512/SHA384
700	663
@@ -927,13 +890,13 @@
927	890	\| 1 KiB message \| 8761 \| 21.90 \| 45,657 \|
928	891
929	892
930		-## HMAC384
931		-
932		-Hash-based message authentication code (HMAC) is a cryptographic authentication technique that uses a hash function and a secret key. HMAC involves a cryptographic hash function and a secret cryptographic key. This implementation supports HMAC-SHA-384-192 as specified in [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5]. The implementation is compatible with the HMAC-SHA-384-192 authentication and integrity functions defined in [RFC 4868](https://tools.ietf.org/html/rfc4868) [6].
933		-
934		-Caliptra HMAC implementation uses SHA384 as the hash function, accepts a 384-bit key, and generates a 384-bit tag.
935		-
936		-The implementation also supports PRF-HMAC-SHA-384. The PRF-HMAC-SHA-384 algorithm is identical to HMAC-SHA-384-192, except that variable-length keys are permitted, and the truncation step is not performed.
	893	+## HMAC512/HMAC384
	894	+
	895	+Hash-based message authentication code (HMAC) is a cryptographic authentication technique that uses a hash function and a secret key. HMAC involves a cryptographic hash function and a secret cryptographic key. This implementation supports the HMAC512 variants HMAC-SHA-512-256 and HMAC-SHA-384-192 as specified in [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5]. The implementation is compatible with the HMAC-SHA-512-256 and HMAC-SHA-384-192 authentication and integrity functions defined in [RFC 4868](https://tools.ietf.org/html/rfc4868) [6].
	896	+
	897	+Caliptra HMAC implementation uses SHA512 as the hash function, accepts a 512-bit key, and generates a 512-bit tag.
	898	+
	899	+The implementation also supports PRF-HMAC-SHA-512. The PRF-HMAC-SHA-512 algorithm is identical to HMAC-SHA-512-256, except that variable-length keys are permitted, and the truncation step is not performed.
937	900
938	901	The HMAC algorithm is described as follows:
939	902	* The key is fed to the HMAC core to be padded
@@ -980,9 +943,15 @@
980	943
981	944	#### Hashing
982	945
983		-The HMAC core performs the sha2-384 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure.
984		-
985		-Figure 28: HMAC-SHA-384-192 data flow
	946	+The HMAC512 core performs the sha2-512 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure.
	947	+
	948	+Figure 28: HMAC-SHA-512-256 data flow
	949	+
	950	+![](../images/caliptra-rtl/docs/images/HMAC_SHA_512_256.png)
	951	+
	952	+The HMAC384 core performs the sha2-384 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure.
	953	+
	954	+Figure 29: HMAC-SHA-384-192 data flow
986	955
987	956	![](../images/caliptra-rtl/docs/images/HMAC_SHA_384_192.png)
988	957
@@ -990,26 +959,33 @@
990	959
991	960	The HMAC architecture has the finite-state machine as shown in the following figure.
992	961
993		-Figure 29: HMAC FSM
	962	+Figure 30: HMAC FSM
994	963
995	964	![](../images/caliptra-rtl/docs/images/HMAC_FSM.png)
996	965
	966	+### CSR Mode
	967	+
	968	+When the CSR Mode register is set, the HMAC512 core uses the value latched from the cptra_csr_hmac_key interface pins in place of the API key register. These pins are latched internally after powergood assertion during DEVICE_MANUFACTURING lifecycle state. During debug mode operation this value is overridden with all 1's, and during any other lifecycle state it has a value of zero.
	969	+
997	970	### Signal descriptions
998	971
999	972	The HMAC architecture inputs and outputs are described in the following table.
1000	973
1001	974	\| Name \| Input or output \| Description \|
1002		-\| :----------------- \| :-------------- \| :----------- \|
	975	+\| :-------------------------- \| :-------------- \| :----------- \|
1003	976	\| clk \| input \| All signal timings are related to the rising edge of clk. \|
1004	977	\| reset_n \| input \| The reset signal is active LOW and resets the core. This is the only active LOW signal. \|
1005	978	\| init \| input \| The core is initialized and processes the key and the first block of the message. \|
1006	979	\| next \| input \| The core processes the rest of the message blocks using the result from the previous blocks. \|
1007	980	\| zeroize \| input \| The core clears all internal registers to avoid any SCA information leakage. \|
1008		-\| key\[383:0\] \| input \| The input key. \|
	981	+\| csr_mode \| input \| When set, the key comes from the cptra_csr_hmac_key interface pins. This key is valid only during MANUFACTURING mode. \|
	982	+\| mode \| input \| Indicates the hmac type of the function. This can be: <br>- HMAC384 <br>- HMAC512. \|
	983	+\| cptra_csr_hmac_key\[511:0\] \| input \| The key to be used during csr mode. \|
	984	+\| key\[511:0\] \| input \| The input key. \|
1009	985	\| block\[1023:0\] \| input \| The input padded block of message. \|
1010		-\| LFSR_seed\[159:0\] \| Input \| The input to seed PRNG to enable the masking countermeasure for SCA protection. \|
	986	+\| LFSR_seed\[383:0\] \| Input \| The input to seed PRNG to enable the masking countermeasure for SCA protection. \|
1011	987	\| ready \| output \| When HIGH, the signal indicates the core is ready. \|
1012		-\| tag\[383:0\] \| output \| The HMAC value of the given key or block. For PRF-HMAC-SHA-384, a 384-bit tag is required. For HMAC-SHA-384-192, the host is responsible for reading 192 bits from the MSB. \|
	988	+\| tag\[511:0\] \| output \| The HMAC value of the given key or block. For PRF-HMAC-SHA-512, a 512-bit tag is required. For HMAC-SHA-512-256, the host is responsible for reading 256 bits from the MSB. \|
1013	989	\| tag_valid \| output \| When HIGH, the signal indicates the result is ready. \|
1014	990
1015	991
@@ -1021,7 +997,7 @@
1021	997
1022	998	The following pseudocode demonstrates how the HMAC interface can be implemented.
1023	999
1024		-Figure 30: HMAC pseudocode
	1000	+Figure 31: HMAC pseudocode
1025	1001
1026	1002	![](../images/caliptra-rtl/docs/images/HMAC_pseudo.png)
1027	1003
@@ -1033,7 +1009,7 @@
1033	1009
1034	1010	The embedded countermeasures are based on "Differential Power Analysis of HMAC Based on SHA-2, and Countermeasures" by McEvoy et. al. To provide the required random values for masking intermediate values, a lightweight 74-bit LFSR is implemented. Based on “Spin Me Right Round Rotational Symmetry for FPGA-specific AES” by Wegener et. al., LFSR is sufficient for masking statistical randomness.
1035	1011
1036		-Each round of SHA512 execution needs 6,432 random bits, while one HMAC operation needs at least 4 rounds of SHA512 operations. However, the proposed architecture requires only 160-bit LFSR seed and provides first-order DPA attack protection at the cost of 10% latency overhead with negligible hardware resource overhead.
	1012	+Each round of SHA512 execution needs 6,432 random bits, while one HMAC operation needs at least 4 rounds of SHA512 operations. However, the proposed architecture requires only 384-bit LFSR seed and provides first-order DPA attack protection at the cost of 10% latency overhead with negligible hardware resource overhead.
1037	1013
1038	1014	### Performance
1039	1015
@@ -1054,9 +1030,9 @@
1054	1030	\| 128 KiB message \| 207,979 \| 519.947 \| 1,923 \|
1055	1031
1056	1032
1057		-#### Hardware/software architecture
1058		-
1059		-In this architecture, the HMAC interface and controller are implemented in RISC-V core. The performance specification of the HMAC architecture is reported as shown in the following table.
	1033	+#### Hardware/software architecture
	1034	+
	1035	+In this architecture, the HMAC interface and controller are implemented in RISC-V core. The performance specification of the HMAC architecture is reported as shown in the following table.
1060	1036
1061	1037	\| Operation \| Cycle count \[CCs\] \| Time \[us\] @ 400 MHz \| Throughput \[op/s\] \|
1062	1038	\| :-------------------- \| :------------------ \| :-------------------- \| :------------------ \|
@@ -1090,7 +1066,7 @@
1090	1066
1091	1067	1. Set V_init = 0x01 0x01 0x01 ... 0x01 (V has 384-bit)
1092	1068	2. Set K_init = 0x00 0x00 0x00 ... 0x00 (K has 384-bit)
1093		- 3. K_tmp = HMAC(K_init, V_init \|\| 0x00 \|\| entropy \|\| nonce)
	1069	+ 3. K_tmp = HMAC(K_init, V_init \|\| 0x00 \|\| entropy \|\| nonce)
1094	1070	4. V_tmp = HMAC(K_tmp, V_init)
1095	1071	5. K_new = HMAC(K_tmp, V_tmp \|\| 0x01 \|\| entropy \|\| nonce)
1096	1072	6. V_new = HMAC(K_new, V_tmp)
@@ -1138,13 +1114,15 @@
1138	1114
1139	1115	## ECC
1140	1116
1141		-The ECC unit includes the ECDSA (Elliptic Curve Digital Signature Algorithm) engine, offering a variant of the cryptographically secure Digital Signature Algorithm (DSA), which uses elliptic curve (ECC). A digital signature is an authentication method in which a public key pair and a digital certificate are used as a signature to verify the identity of a recipient or sender of information.
	1117	+The ECC unit includes the ECDSA (Elliptic Curve Digital Signature Algorithm) engine and the ECDH (Elliptic Curve Diffie-Hellman Key-Exchange) engine, offering a variant of the cryptographically secure Digital Signature Algorithm (DSA) and Diffie-Hellman Key-Exchange (DH), which uses elliptic curve (ECC). A digital signature is an authentication method in which a public key pair and a digital certificate are used as a signature to verify the identity of a recipient or sender of information.
1142	1118
1143	1119	The hardware implementation supports deterministic ECDSA, 384 Bits (Prime Field), also known as NIST-Secp384r1, described in RFC6979.
1144	1120
	1121	+The hardware implementation also supports ECDH, 384 Bits (Prime Field), also known as NIST-Secp384r1, described in SP800-56A.
	1122	+
1145	1123	Secp384r1 parameters are shown in the following figure.
1146	1124
1147		-Figure 31: Secp384r1 parameters
	1125	+Figure 32: Secp384r1 parameters
1148	1126
1149	1127	![](../images/caliptra-rtl/docs/images/secp384r1_params.png)
1150	1128
@@ -1152,9 +1130,11 @@
1152	1130
1153	1131	The ECDSA consists of three operations, shown in the following figure.
1154	1132
1155		-Figure 32: ECDSA operations
	1133	+Figure 33: ECDSA operations
1156	1134
1157	1135	![](../images/caliptra-rtl/docs/images/ECDSA_ops.png)
	1136	+
	1137	+The ECDH also consists of the sharedkey generation.
1158	1138
1159	1139	#### KeyGen
1160	1140
@@ -1166,7 +1146,7 @@
1166	1146
1167	1147	#### Signing
1168	1148
1169		-In the signing algorithm, a signature (r, s) is generated by Sign(privKey, h), taking a privKey and hash of message m, h = hash(m), using a cryptographic hash function, SHA384. The signing algorithm includes:
	1149	+In the signing algorithm, a signature (r, s) is generated by Sign(privKey, h), taking a privKey and hash of message m, h = hash(m), using a cryptographic hash function, SHA512. The signing algorithm includes:
1170	1150
1171	1151	* Generate a random number k in the range [1..n-1], while k = HMAC\_DRBG(privKey, h)
1172	1152	* Calculate the random point R = k × G
@@ -1176,24 +1156,32 @@
1176	1156
1177	1157	#### Verifying
1178	1158
1179		-The signature (r, s) can be verified by Verify(pubKey ,h ,r, s) considering the public key pubKey and hash of message m, h=hash(m) using the same cryptographic hash function SHA384. The output is r’ value of verifying a signature. The ECDSA verify algorithm includes:
	1159	+The signature (r, s) can be verified by Verify(pubKey ,h ,r, s) considering the public key pubKey and hash of message m, h=hash(m) using the same cryptographic hash function SHA512. The output is r’ value of verifying a signature. The ECDSA verify algorithm includes:
1180	1160
1181	1161	* Calculate s1 = s<sup>−1</sup> mod n
1182	1162	* Compute R' = (h × s1) × G + (r × s1) × pubKey
1183	1163	* Take r’ = R'x mod n, while R'x is x coordinate of R’=(R'x, R'y)
1184	1164	* Verify the signature by comparing whether r' == r
1185	1165
	1166	+#### ECDH sharedkey
	1167	+
	1168	+In ECDH sharedkey generation, the shared key is generated by ECDH_sharedkey(privKey_A, pubKey_B), taking an own prikey and other party pubkey. The ECDH sharedkey algorithm is as follows:
	1169	+
	1170	+* Compute P = sharedkey(privkey_A, pubkey_b) where P(x,y) is a point on ECC.
	1171	+* Output sharedkey = Px, where Px is x coordinate of P.
	1172	+
	1173	+
1186	1174	### Architecture
1187	1175
1188	1176	The ECC top-level architecture is shown in the following figure.
1189	1177
1190		-Figure 33: ECDSA architecture
1191		-
1192		-![](../images/caliptra-rtl/docs/images/ECDSA_arch.png)
	1178	+Figure 34: ECC architecture
	1179	+
	1180	+![](../images/caliptra-rtl/docs/images/ECC_arch.png)
1193	1181
1194	1182	### Signal descriptions
1195	1183
1196		-The ECDSA architecture inputs and outputs are described in the following table.
	1184	+The ECC architecture inputs and outputs are described in the following table.
1197	1185
1198	1186
1199	1187	\| Name \| Input or output \| Description \|
@@ -1206,49 +1194,56 @@
1206	1194	\| nonce \[383:0\] \| input \| The deterministic nonce for HMAC_DRBG in the KeyGen operation. \|
1207	1195	\| privKey_in\[383:0\] \| input \| The input private key used in the signing operation. \|
1208	1196	\| pubKey_in\[1:0\]\[383:0\] \| input \| The input public key(x,y) used in the verifying operation. \|
1209		-\| hashed_msg\[383:0\] \| input \| The hash of message using SHA384. \|
	1197	+\| hashed_msg\[383:0\] \| input \| The hash of message using SHA512. \|
1210	1198	\| ready \| output \| When HIGH, the signal indicates the core is ready. \|
1211	1199	\| privKey_out\[383:0\] \| output \| The generated private key in the KeyGen operation. \|
1212	1200	\| pubKey_out\[1:0\]\[383:0\] \| output \| The generated public key(x,y) in the KeyGen operation. \|
1213	1201	\| r\[383:0\] \| output \| The signature value of the given priveKey/message. \|
1214	1202	\| s\[383:0\] \| output \| The signature value of the given priveKey/message. \|
1215	1203	\| r’\[383:0\] \| Output \| The signature verification result. \|
	1204	+\| DH_sharedkey\[383:0\] \| output \| The generated shared key in the ECDH sharedkey operation. \|
1216	1205	\| valid \| output \| When HIGH, the signal indicates the result is ready. \|
1217	1206
1218	1207
1219	1208	### Address map
1220	1209
1221		-The ECDSA address map is shown here: [ecc\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.ecc_reg).
	1210	+The ECC address map is shown here: [ecc\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.ecc_reg).
1222	1211
1223	1212	### Pseudocode
1224	1213
1225		-The following pseudocode blocks demonstrate example implementations for KeyGen, Signing, and Verifying.
	1214	+The following pseudocode blocks demonstrate example implementations for KeyGen, Signing, Verifying, and ECDH sharedkey.
1226	1215
1227	1216	#### KeyGen
1228	1217
1229		-Figure 34: KeyGen pseudocode
	1218	+Figure 35: KeyGen pseudocode
1230	1219
1231	1220	![](../images/caliptra-rtl/docs/images/keygen_pseudo.png)
1232	1221
1233	1222	#### Signing
1234	1223
1235		-Figure 35: Signing pseudocode
	1224	+Figure 36: Signing pseudocode
1236	1225
1237	1226	![](../images/caliptra-rtl/docs/images/signing_pseudo.png)
1238	1227
1239	1228	#### Verifying
1240	1229
1241		-Figure 36: Verifying pseudocode
	1230	+Figure 37: Verifying pseudocode
1242	1231
1243	1232	![](../images/caliptra-rtl/docs/images/verify_pseudo.png)
1244	1233
	1234	+#### ECDH sharedkey
	1235	+
	1236	+Figure 38: ECDH sharedkey pseudocode
	1237	+
	1238	+![](../images/caliptra-rtl/docs/images/sharedkey_pseudo.png)
	1239	+
1245	1240	### SCA countermeasure
1246	1241
1247		-The described ECDSA has three main routines: KeyGen, Signing, and Verifying. Since the Verifying routine requires operation with public values rather than a secret value, our side-channel analysis does not cover this routine. Our evaluation covers the KeyGen and Signing routines where the secret values are processed.
1248		-
1249		-KeyGen consists of HMAC DRBG and scalar multiplication, while Signing first requires a message hashing and then follows the same operations as KeyGen (HMAC DRBG and scalar multiplication). The last step of Signing is generating “S” as the proof of signature. Since HMAC DRBG and hash operations are evaluated separately in our document, this evaluation covers scalar multiplication and modular arithmetic operations.
1250		-
1251		-#### Scalar multiplication
	1242	+The described ECC has four main routines: KeyGen, Signing, Verifying, and ECDH sharedkey. Since the Verifying routine requires operation with public values rather than a secret value, our side-channel analysis does not cover this routine. Our evaluation covers the KeyGen, Signing, and ECDH sharedkey routines where the secret values are processed.
	1243	+
	1244	+KeyGen consists of HMAC DRBG and scalar multiplication, while Signing first requires a message hashing and then follows the same operations as KeyGen (HMAC DRBG and scalar multiplication). The last step of Signing is generating “S” as the proof of signature. Since HMAC DRBG and hash operations are evaluated separately in our document, this evaluation covers scalar multiplication and modular arithmetic operations.
	1245	+
	1246	+#### Scalar multiplication
1252	1247
1253	1248	To perform the scalar multiplication, the Montgomery ladder is implemented, which is inherently resistant to timing and single power analysis (SPA) attacks.
1254	1249
@@ -1256,7 +1251,7 @@
1256	1251
1257	1252	To protect the architecture against horizontal power/electromagnetic (EM) and differential power analysis (DPA) attacks, several countermeasures are embedded in the design [9]. Since these countermeasures require random inputs, HMAC-DRBG is fed by IV to generate these random values.
1258	1253
1259		-Since HMAC-DRBG generates random value in a deterministic way, firmware MUST feed different IV to ECC engine for EACH keygen and signing operation.
	1254	+Since HMAC-DRBG generates random value in a deterministic way, firmware MUST feed different IV to ECC engine for EACH keygen, signing, and ECDH sharedkey operation.
1260	1255
1261	1256	#### Base point randomization
1262	1257
@@ -1284,7 +1279,7 @@
1284	1279
1285	1280	Generating “S” as the proof of signature at the steps of the signing operation leaks where the hashed message is signed with private key and ephemeral key as follows:
1286	1281
1287		-Since the given message is known or the signature part r is known, the attacker can perform a known-plaintext attack. The attacker can sign multiple messages with the same key, or the attacker can observe part of the signature that is generated with multiple messages but the same key.
	1282	+Since the given message is known or the signature part r is known, the attacker can perform a known-plaintext attack. The attacker can sign multiple messages with the same key, or the attacker can observe part of the signature that is generated with multiple messages but the same key.
1288	1283
1289	1284	The evaluation shows that the CPA attack can be performed with a small number of traces, respectively. Thus, an arithmetic masked design for these operations is implemented.
1290	1285
@@ -1292,7 +1287,7 @@
1292	1287
1293	1288	This countermeasure is achieved by randomizing the privkey as follows:
1294	1289
1295		-Although computation of “S” seems the most vulnerable point in our scheme, the operation does not have a big contribution to overall latency. Hence, masking these operations has low overhead on the cost of the design.
	1290	+Although computation of “S” seems the most vulnerable point in our scheme, the operation does not have a big contribution to overall latency. Hence, masking these operations has low overhead on the cost of the design.
1296	1291
1297	1292	#### Random number generator for SCA countermeasure
1298	1293
@@ -1304,7 +1299,7 @@
1304	1299	2. KEYGEN PRIVKEY: Running HMAC\_DRBG with seed and nonce to generate the privkey in KEYGEN operation.
1305	1300	3. SIGNING NONCE: Running HMAC\_DRBG based on RFC6979 in SIGNING operation with privkey and hashed\_msg.
1306	1301
1307		-Figure 37: HMAC\_DRBG utilization
	1302	+Figure 39: HMAC\_DRBG utilization
1308	1303
1309	1304	![](../images/caliptra-rtl/docs/images/HMAC_DRBG_util.png)
1310	1305
@@ -1320,7 +1315,7 @@
1320	1315
1321	1316	The data flow of the HMAC\_DRBG operation in keygen operation mode is shown in the following figure.
1322	1317
1323		-Figure 38: HMAC\_DRBG data flow
	1318	+Figure 40: HMAC\_DRBG data flow
1324	1319
1325	1320	![](../images/caliptra-rtl/docs/images/HMAC_DRBG_data.png)
1326	1321
@@ -1330,7 +1325,7 @@
1330	1325
1331	1326	In practice, observing a t-value greater than a specific threshold (mainly 4.5) indicates the presence of leakage. However, in ECC, due to its latency, around 5 million samples are required to be captured. This latency leads to many false positives and the TVLA threshold can be considered a higher value than 4.5. Based on the following figure from “Side-Channel Analysis and Countermeasure Design for Implementation of Curve448 on Cortex-M4” by Bisheh-Niasar et. al., the threshold can be considered equal to 7 in our case.
1332	1327
1333		-Figure 39: TVLA threshold as a function of the number of samples per trace
	1328	+Figure 41: TVLA threshold as a function of the number of samples per trace
1334	1329
1335	1330	![](../images/caliptra-rtl/docs/images/TVLA_threshold.png)
1336	1331
@@ -1340,7 +1335,7 @@
1340	1335	The TVLA results for performing seed/nonce-dependent leakage detection using 200,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC keygen by changing the seed/nonce after 200,000 operations.
1341	1336
1342	1337
1343		-Figure 40: seed/nonce-dependent leakage detection using TVLA for ECC keygen after 200,000 traces
	1338	+Figure 42: seed/nonce-dependent leakage detection using TVLA for ECC keygen after 200,000 traces
1344	1339
1345	1340	![](../images/caliptra-rtl/docs/images/tvla_keygen.png)
1346	1341
@@ -1348,13 +1343,13 @@
1348	1343
1349	1344	The TVLA results for performing privkey-dependent leakage detection using 20,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC signing by changing the privkey after 20,000 operations.
1350	1345
1351		-Figure 41: privkey-dependent leakage detection using TVLA for ECC signing after 20,000 traces
	1346	+Figure 43: privkey-dependent leakage detection using TVLA for ECC signing after 20,000 traces
1352	1347
1353	1348	![](../images/caliptra-rtl/docs/images/TVLA_privekey.png)
1354	1349
1355	1350	The TVLA results for performing message-dependent leakage detection using 64,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC signing by changing the message after 64,000 operations.
1356	1351
1357		-Figure 42: Message-dependent leakage detection using TVLA for ECC signing after 64,000 traces
	1352	+Figure 44: Message-dependent leakage detection using TVLA for ECC signing after 64,000 traces
1358	1353
1359	1354	![](../images/caliptra-rtl/docs/images/TVLA_msg_dependent.png)
1360	1355
@@ -1391,17 +1386,17 @@
1391	1386
1392	1387	## LMS Accelerator
1393	1388
1394		-LMS cryptography is a type of hash-based digital signature scheme that was standardized by NIST in 2020. It is based on the Leighton-Micali Signature (LMS) system, which uses a Merkle tree structure to combine many one-time signature (OTS) keys into a single public key. LMS cryptography is resistant to quantum attacks and can achieve a high level of security without relying on large integer mathematics.
	1389	+LMS cryptography is a type of hash-based digital signature scheme that was standardized by NIST in 2020. It is based on the Leighton-Micali Signature (LMS) system, which uses a Merkle tree structure to combine many one-time signature (OTS) keys into a single public key. LMS cryptography is resistant to quantum attacks and can achieve a high level of security without relying on large integer mathematics.
1395	1390
1396	1391	Caliptra supports only LMS verification using a software/hardware co-design approach. Hence, the LMS accelerator reuses the SHA256 engine to speedup the Winternitz chain by removing software-hardware interface overhead. The LMS-OTS verification algorithm is shown in follwoing figure:
1397	1392
1398		-Figure 43: LMS-OTS Verification algorithm
	1393	+Figure 45: LMS-OTS Verification algorithm
1399	1394
1400	1395	![](../images/caliptra-rtl/docs/images/LMS_verifying_alg.png)
1401	1396
1402	1397	The high-level architecture of LMS is shown in the following figure.
1403	1398
1404		-Figure 44: LMS high-level architecture
	1399	+Figure 46: LMS high-level architecture
1405	1400
1406	1401	![](../images/caliptra-rtl/docs/images/LMS_high_level.png)
1407	1402
@@ -1426,7 +1421,7 @@
1426	1421
1427	1422	The Winternitz hash chain can be accelerated in hardware to enhance the performance of the design. For that, a configurable architecture is proposed that can reuse SHA256 engine. The LMS accelerator architecture is shown in the following figure, while H is SHA256 engine.
1428	1423
1429		-Figure 45: Winternitz chain architecture
	1424	+Figure 47: Winternitz chain architecture
1430	1425
1431	1426	![](../images/caliptra-rtl/docs/images/LMS_wntz_arch.png)
1432	1427
@@ -1456,10 +1451,794 @@
1456	1451
1457	1452	The address map for LMS accelerator integrated into SHA256 is shown here: [sha256\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.sha256_reg).
1458	1453
	1454	+## Adams Bridge - Dilithium (ML-DSA)
	1455	+
	1456	+Please refer to the [Adams-bridge specification](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md)
	1457	+
	1458	+### Address map
	1459	+Address map of ML-DSA accelerator is shown here: [ML-DSA\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mldsa_reg)
	1460	+
	1461	+## AES
	1462	+
	1463	+The AES unit is a cryptographic accelerator that processes requests from the processor to encrypt or decrypt 16-byte data blocks. It supports AES-128/192/256 in various modes, including Electronic Codebook (ECB), Cipher Block Chaining (CBC), Cipher Feedback (CFB) with a fixed segment size of 128 bits (CFB-128), Output Feedback (OFB), Counter (CTR), and Galois/Counter Mode (GCM).
	1464	+
	1465	+The AES unit is reused from here, (see [aes](https://github.com/lowRISC/opentitan/tree/master/hw/ip/aes) with a shim to translate from AHB-lite to the tl-ul interface.
	1466	+
	1467	+Additional registers have been added to support key vault integration. Keys from the key vault can be loaded into the AES unit to be used for encryption or decryption.
	1468	+
	1469	+### Operation
	1470	+
	1471	+For more information, see the [AES Programmer's Guide](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/programmers_guide.md).
	1472	+
	1473	+### Signal descriptions
	1474	+
	1475	+The AES architecture inputs and outputs are described in the following table.
	1476	+
	1477	+\| Name \| Input or output \| Description \|
	1478	+\| :--------------------------------- \| :-------------- \| :----------- \|
	1479	+\| clk \| input \| All signal timings are related to the rising edge of clk. \|
	1480	+\| reset_n \| input \| The reset signal is active LOW and resets the core. This is the only active LOW signal. \|
	1481	+\| DATA_IN \| input \| Input block to be encrypted or decrypted. Written in four 32-bit registers. \|
	1482	+\| DATA_OUT \| output \| Output block result of encryption or decryption. Stored in four 32-bit registers. \|
	1483	+\| CTRL_SHADOWED.MANUAL_OPERATION \| input \| Configures the AES core to operation in manual mode. \|
	1484	+\| CTRL_SHADOWED.PRNG_RESEED_RATE \| input \| Configures the rate of reseeding the internal PRNG used for masking. \|
	1485	+\| CTRL_SHADOWED.SIDELOAD \| input \| When asserted, AES core will use the key from the keyvault interface. \|
	1486	+\| CTRL_SHADOWED.KEY_LEN \| input \| Configures the AES key length. Supports 128, 192, and 256-bit keys. \|
	1487	+\| CTRL_SHADOWED.MODE \| input \| Configures the AES block cipher mode. \|
	1488	+\| CTRL_SHADOWED.OPERATION \| input \| Configures the AES core to operate in encryption or decryption modes. \|
	1489	+\| CTRL_GCM_SHADOWED.PHASE \| input \| Configures the GCM phase. \|
	1490	+\| CTRL_GCM_SHADOWED.NUM_VALID_BYTES \| input \| Configures the number of valid bytes of the current input block in GCM. \|
	1491	+\| TRIGGER.PRNG_RESEED \| input \| Forces a PRNG reseed. \|
	1492	+\| TRIGGER.DATA_OUT_CLEAR \| input \| Clears the DATA_OUT registers with pseudo-random data. \|
	1493	+\| TRIGGER.KEY_IV_DATA_IN_CLEAR \| input \| Clears the Key, IV, and DATA_INT registers with pseudo-random data. \|
	1494	+\| TRIGGER.START \| input \| Triggers the encryption/decryption of one data block if in manual operation mode. \|
	1495	+\| STATUS.ALERT_FATAL_FAULT \| output \| A fatal fault has ocurred and the AES unit needs to be reset. \|
	1496	+\| STATUS.ALERT_RECOV_CTRL_UPDATE_ERR \| output \| An update error has occurred in the shadowed Control Register. AES operation needs to be restarted by re-writing the Control Register. \|
	1497	+\| STATUS.INPUT_READY \| output \| The AES unit is ready to receive new data input via the DATA_IN registers. \|
	1498	+\| STATUS.OUTPUT_VALID \| output \| The AES unit has alid output data. \|
	1499	+\| STATUS.OUTPUT_LOST \| output \| All previous output data has been fully read by the processor (0) or at least one previous output data block has been lost (1). It has been overwritten by the AES unit before the processor could fully read it. Once set to 1, this flag remains set until AES operation is restarted by re-writing the Control Register. The primary use of this flag is for design verification. This flag is not meaningful if MANUAL_OPERATION=0. \|
	1500	+\| STATUS.STALL \| output \| The AES unit is stalled because there is previous output data that must be read by the processor before the AES unit can overwrite this data. This flag is not meaningful if MANUAL_OPERATION=1. \|
	1501	+\| STATUS.IDLE \| output \| The AES unit is idle. \|
	1502	+
	1503	+
	1504	+
	1505	+### Address map
	1506	+
	1507	+The AES address map is shown here: [aes\_clp\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.aes_clp_reg).
	1508	+
	1509	+### SCA countermeasures
	1510	+
	1511	+The AES unit employs separate SCA countermeasures for the AES cipher core used for the encryption/decryption part and for the GHASH module used for computing the integrity tag in GCM.
	1512	+
	1513	+### AES cipher core
	1514	+
	1515	+A detailed specification of the SCA countermeasure employed in the AES cipher core is shown here: [AES cipher core SCA countermeasure](https://opentitan.org/book/hw/ip/aes/doc/theory_of_operation.html#1st-order-masking-of-the-cipher-core).
	1516	+The most critical building block of the SCA countermeasure, i.e., the masked AES S-Box, successfully passes formal masking verification at the netlist level using [Alma: Execution-aware Masking Verification](https://github.com/IAIK/coco-alma).
	1517	+The flow required for repeating the formal masking verification using Alma together with a Howto can be found [here](https://github.com/lowRISC/opentitan/blob/master/hw/ip/aes/pre_sca/alma/README.md).
	1518	+The entire AES cipher core including the masked S-Boxes and as well as the PRNG generating the randomness for remasking successfully passes masking evaluation at the netlist level using [PROLEAD - A Probing-Based Leakage Detection Tool for Hardware and Software](https://github.com/ChairImpSec/PROLEAD).
	1519	+The flow required for repeating the masking evaluation using PROLEAD together with a Howto can be found [here](https://github.com/lowRISC/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/prolead/README.md).
	1520	+
	1521	+### GHASH module
	1522	+
	1523	+A detailed specification of the SCA countermeasure employed in the GHASH module is shown here: [GHASH module SCA countermeasure](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#1st-order-masking-of-the-ghash-module).
	1524	+
	1525	+To optimize and verify this masking countermeasure, two different types of experiments have been performed for which the results are given below.
	1526	+1. Formal masking verification using [Alma: Execution-aware Masking Verification](https://github.com/IAIK/coco-alma).
	1527	+ These experiments led to a [series of small design optimizations](https://github.com/vogelpi/opentitan/pull/18) which have been integrated into Caliptra.
	1528	+ The resulting design successfully passes formal masking verification at the netlist level.
	1529	+1. [Test-vector leakage assessment (TVLA)](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf) applied to power SCA traces captured on a ChipWhisperer-based FPGA setup.
	1530	+ These experiments confirm the formal masking verification results:
	1531	+ No 1st-order SCA can be observed during the GHASH operation.
	1532	+ The leakage observed at the boundary of and outside the GHASH operation can be attributed to the evaluation methodology and the handling of unmasked and uncritical data, as well as to FPGA-specific leakage effects known from literature.
	1533	+ We are confident that the optimized SCA hardening concept effectively deters SCA attacks.
	1534	+
	1535	+#### Formal masking verification using Alma
	1536	+
	1537	+[Alma](https://ieeexplore.ieee.org/document/9617707) is an open source, formal masking verification tool developed at TU Graz which enables formal verification of masking SCA countermeasures at the netlist level.
	1538	+The main advantages of this approach compared to analyzing FPGA power traces are as follows:
	1539	+
	1540	+* The turn-around time is much faster as it does not involve FPGA bitstream generation and capturing power traces (both can take several hours).
	1541	+* Netlist-based analysis tools typically enable pinpointing sources of SCA leakage and easily allow analyzing sub parts of the masked design individually.
	1542	+ As a result, individual issues can be fixed up faster.
	1543	+* The analyzed netlist is closer to the targeted ASIC implementation.
	1544	+ During FPGA synthesis, the netlist is mapped to the logic elements such as look-up tables (LUTs) available on the selected FPGA which are fundamentally different from more simple ASIC gates.
	1545	+
	1546	+However, formal netlist analysis tools may not be perfect and they also have limitations in terms of what can be analyzed.
	1547	+For example, the maximum supported netlist size depends on the complexity and number of the non-linear elements.
	1548	+Also, random number generators and in particular pseudo-random number generators typically need to be excluded from the analysis and random number inputs need to be assumed as ideal by tools.
	1549	+Thus, they don’t replace FPGA-based analysis.
	1550	+We use them to increase our confidence in our SCA countermeasures and to close countermeasure verification faster by reducing the number of FPGA evaluation runs.
	1551	+
	1552	+##### Prerequisites
	1553	+
	1554	+The [Alma-based formal masking verification flow together with a Howto](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_sca/alma#readme) (including installation instructions) as well an [open source Yosys synthesis flow](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_syn) are available open soure.
	1555	+The tool can both run on generic Yosys netlists or on proprietary and technology-specific netlists.
	1556	+For the latter, a [slightly modified verification flow with an additional translation step](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_sca/alma_post_syn#readme) is required.
	1557	+To verify the GHASH SCA countermeasure, the generic flow was used with the following tool versions:
	1558	+
	1559	+* Alma ([specific commit](https://github.com/vogelpi/coco-alma/commit/68e436f67dee7d27fb782864dc5523ceb4bd27bf))
	1560	+* Yosys 0.36 (git sha1 8f07a0d84)
	1561	+* sv2v v0.0.11-28-g81d8225
	1562	+* Verilator 4.214 2021-10-17 rev v4.214
	1563	+
	1564	+##### Yosys Netlist Synthesis
	1565	+
	1566	+Setup the [open source Yosys synthesis flow](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_syn) by copying the [`syn_setup.example.sh`](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_syn/syn_setup.example.sh) file and renaming it to `syn_setup.sh`.
	1567	+Change the `LR_SYNTH_TOP_MODULE` variable to `aes_ghash_wrap` and the `LR_SYNTH_CELL_LIBRARY_PATH` to the `NangateOpenCellLibrary_typical.lib` file in the folder where you installed the nangate45 library.
	1568	+
	1569	+Then, start the synthesis by executing
	1570	+
	1571	+```sh
	1572	+./syn_yosys.sh
	1573	+```
	1574	+This should produce output similar to what is shown below:
	1575	+
	1576	+```
	1577	+8. Printing statistics.
	1578	+
	1579	+=== aes_ghash_wrap ===
	1580	+
	1581	+ Number of wires: 24543
	1582	+ Number of wire bits: 29339
	1583	+ Number of public wires: 567
	1584	+ Number of public wire bits: 5363
	1585	+ Number of memories: 0
	1586	+ Number of memory bits: 0
	1587	+ Number of processes: 0
	1588	+ Number of cells: 26214
	1589	+ AND2_X1 1585
	1590	+ AND3_X1 4
	1591	+ AND4_X1 32
	1592	+ AOI211_X1 58
	1593	+ AOI21_X1 293
	1594	+ AOI221_X1 215
	1595	+ AOI22_X1 364
	1596	+ DFFR_X1 1468
	1597	+ DFFS_X1 5
	1598	+ INV_X1 584
	1599	+ MUX2_X1 1252
	1600	+ NAND2_X1 1870
	1601	+ NAND3_X1 128
	1602	+ NAND4_X1 37
	1603	+ NOR2_X1 7551
	1604	+ NOR3_X1 445
	1605	+ NOR4_X1 28
	1606	+ OAI211_X1 98
	1607	+ OAI21_X1 827
	1608	+ OAI221_X1 3
	1609	+ OAI22_X1 183
	1610	+ OR2_X1 28
	1611	+ OR3_X1 67
	1612	+ OR4_X1 2
	1613	+ XNOR2_X1 7122
	1614	+ XOR2_X1 1965
	1615	+
	1616	+ Chip area for module '\aes_ghash_wrap': 37534.728000
	1617	+
	1618	+====== End Yosys Stat Report ======
	1619	+
	1620	+Warnings: 20 unique messages, 102 total
	1621	+
	1622	+End of script. Logfile hash: 16c4d13569, CPU: user 25.11s system 0.12s, MEM: 176.29 MB peak
	1623	+Yosys 0.36 (git sha1 8f07a0d84, gcc 11.4.0-1ubuntu1~22.04 -fPIC -Os)
	1624	+Time spent: 66% 2x abc (47 sec), 9% 40x opt_expr (6 sec), ...
	1625	+Area in kGE = 47.04
	1626	+```
	1627	+
	1628	+Note that the reported area is quite a bit bigger compared to the number reported in the [GHASH SCA countermeasure specification](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#1st-order-masking-of-the-ghash-module)
	1629	+The reasons are twofold:
	1630	+
	1631	+1. The `aes_ghash_wrap` module synthesized is a wrapper module around the GHASH module in focus of this analysis.
	1632	+ The goal of the wrapper is to separately feed in secrets (the hash subkey H and the encrypted initial counter block S) as well as randomness in a tool aware manner.
	1633	+ As such, the wrapper includes some additional muxing resources and a counter to ease interpretation of results.
	1634	+2. To speed up the formal analysis, the pipelined Galois-field multipliers have been instantiated with a latency of 4 instead of 32 clock cycles as on FPGA.
	1635	+ While the latency or more precisely the processing parallelism does have an impact on the SNR, it does not have an impact on the formal netlist analysis which is performed in a so-to-say noise free environment.
	1636	+
	1637	+##### Formal Netlist Analysis
	1638	+
	1639	+After synthesizing the netlist, the following steps should be taken to perform the analysis:
	1640	+
	1641	+1. Make sure to source the `build_consts.sh` script
	1642	+ ```sh
	1643	+ source util/build_consts.sh
	1644	+ ```
	1645	+ in order to set up some shell variables.
	1646	+
	1647	+1. Enter the directory where you have downloaded Alma and load the virtual Python environment
	1648	+ ```sh
	1649	+ source dev/bin/activate
	1650	+ ```
	1651	+
	1652	+1. Launch the Alma tool to parse, trace (simulate) and formally verify the netlist.
	1653	+ For simplicity, a single script is provided to launch all the required steps with a single command.
	1654	+ Simply run
	1655	+ ```sh
	1656	+ ${REPO_TOP}/hw/ip/aes/pre_sca/alma/verify_aes_ghash.sh
	1657	+ ```
	1658	+ This should produce output similar to the one below:
	1659	+ ```
	1660	+ Verifying aes_ghash_wrap using Alma
	1661	+ Starting yosys synthesis...
	1662	+ \| CircuitGraph \| Total: 29882 \| Linear: 9091 \| Non-linear: 12741 \| Registers: 1473 \| Mux: 3538 \|
	1663	+ parse.py successful (47.99s)
	1664	+ 1: Running verilator on given netlist
	1665	+ 2: Compiling verilated netlist library
	1666	+ 3: Compiling provided verilator testbench
	1667	+ 4: Simulating circuit and generating VCD
	1668	+ \| CircuitGraph \| Total: 29882 \| Linear: 9091 \| Non-linear: 12741 \| Registers: 1473 \| Mux: 3538 \|
	1669	+ tmp/tmp.vcd:24765: [WARNING] Entry for name alert_fatal_i already exists in namemap (alert_fatal_i -> Ce")
	1670	+ tmp/tmp.vcd:24766: [WARNING] Entry for name alert_o already exists in namemap (alert_o -> De")
	1671	+ tmp/tmp.vcd:24767: [WARNING] Entry for name clear_i already exists in namemap (clear_i -> Ee")
	1672	+ tmp/tmp.vcd:24768: [WARNING] Entry for name clk_i already exists in namemap (clk_i -> Fe")
	1673	+ tmp/tmp.vcd:24770: [WARNING] Entry for name cyc_ctr_o already exists in namemap (cyc_ctr_o -> Ge")
	1674	+ tmp/tmp.vcd:24771: [WARNING] Entry for name data_in_prev_i already exists in namemap (data_in_prev_i -> He")
	1675	+ tmp/tmp.vcd:24772: [WARNING] Entry for name data_out_i already exists in namemap (data_out_i -> Le")
	1676	+ tmp/tmp.vcd:24773: [WARNING] Entry for name first_block_o already exists in namemap (first_block_o -> Pe")
	1677	+ tmp/tmp.vcd:24774: [WARNING] Entry for name gcm_phase_i already exists in namemap (gcm_phase_i -> Qe")
	1678	+ tmp/tmp.vcd:24775: [WARNING] Entry for name ghash_state_done_o already exists in namemap (ghash_state_done_o -> Re")
	1679	+ tmp/tmp.vcd:24776: [WARNING] Entry for name hash_subkey_i already exists in namemap (hash_subkey_i -> Ve")
	1680	+ tmp/tmp.vcd:24777: [WARNING] Entry for name in_ready_o already exists in namemap (in_ready_o -> ^e")
	1681	+ tmp/tmp.vcd:24778: [WARNING] Entry for name in_valid_i already exists in namemap (in_valid_i -> _e")
	1682	+ tmp/tmp.vcd:24779: [WARNING] Entry for name load_hash_subkey_i already exists in namemap (load_hash_subkey_i -> `e")
	1683	+ tmp/tmp.vcd:24780: [WARNING] Entry for name num_valid_bytes_i already exists in namemap (num_valid_bytes_i -> ae")
	1684	+ tmp/tmp.vcd:24781: [WARNING] Entry for name op_i already exists in namemap (op_i -> be")
	1685	+ tmp/tmp.vcd:24782: [WARNING] Entry for name out_ready_i already exists in namemap (out_ready_i -> ce")
	1686	+ tmp/tmp.vcd:24783: [WARNING] Entry for name out_valid_o already exists in namemap (out_valid_o -> de")
	1687	+ tmp/tmp.vcd:24784: [WARNING] Entry for name prd_i already exists in namemap (prd_i -> ee")
	1688	+ tmp/tmp.vcd:24785: [WARNING] Entry for name rst_ni already exists in namemap (rst_ni -> me")
	1689	+ tmp/tmp.vcd:24786: [WARNING] Entry for name s_i already exists in namemap (s_i -> ne")
	1690	+ 0
	1691	+ 0
	1692	+ Building formula for cycle 0: vars 0 clauses 0
	1693	+ Checking cycle 0:
	1694	+ Building formula for cycle 1: vars 1024 clauses 1536
	1695	+ Checking cycle 1:
	1696	+ Building formula for cycle 2: vars 3968 clauses 6528
	1697	+ Checking cycle 2:
	1698	+ Building formula for cycle 3: vars 6298 clauses 11026
	1699	+ Checking cycle 3:
	1700	+ Building formula for cycle 4: vars 14888 clauses 34886
	1701	+ Checking cycle 4:
	1702	+ Building formula for cycle 5: vars 20924 clauses 52734
	1703	+ Checking cycle 5:
	1704	+ Building formula for cycle 6: vars 53986 clauses 143674
	1705	+ Checking cycle 6:
	1706	+ Building formula for cycle 7: vars 57570 clauses 150970
	1707	+ Checking cycle 7:
	1708	+ Building formula for cycle 8: vars 80484 clauses 169282
	1709	+ Checking cycle 8:
	1710	+ Building formula for cycle 9: vars 213770 clauses 504198
	1711	+ Checking cycle 9:
	1712	+ Building formula for cycle 10: vars 594390 clauses 1617276
	1713	+ Checking cycle 10:
	1714	+ Building formula for cycle 11: vars 1024018 clauses 2881744
	1715	+ Checking cycle 11:
	1716	+ Building formula for cycle 12: vars 1704424 clauses 4910342
	1717	+ Checking cycle 12:
	1718	+ Building formula for cycle 13: vars 1713897 clauses 4915466
	1719	+ Checking cycle 13:
	1720	+ Building formula for cycle 14: vars 1834911 clauses 5233038
	1721	+ Checking cycle 14:
	1722	+ Building formula for cycle 15: vars 2258841 clauses 6492446
	1723	+ Checking cycle 15:
	1724	+ Building formula for cycle 16: vars 2734646 clauses 7907830
	1725	+ Checking cycle 16:
	1726	+ Building formula for cycle 17: vars 5868600 clauses 18374416
	1727	+ Checking cycle 17:
	1728	+ Building formula for cycle 18: vars 5922747 clauses 18524578
	1729	+ Checking cycle 18:
	1730	+ Building formula for cycle 19: vars 6100898 clauses 19061808
	1731	+ Checking cycle 19:
	1732	+ Building formula for cycle 20: vars 6427297 clauses 20074334
	1733	+ Checking cycle 20:
	1734	+ Building formula for cycle 21: vars 6949506 clauses 21693947
	1735	+ Checking cycle 21:
	1736	+ Building formula for cycle 22: vars 6949506 clauses 21693947
	1737	+ Checking cycle 22:
	1738	+ Building formula for cycle 23: vars 6949506 clauses 21693947
	1739	+ Checking cycle 23:
	1740	+ Building formula for cycle 24: vars 7057992 clauses 21994175
	1741	+ Checking cycle 24:
	1742	+ Building formula for cycle 25: vars 7407412 clauses 23047989
	1743	+ Checking cycle 25:
	1744	+ Building formula for cycle 26: vars 7797810 clauses 24221073
	1745	+ Checking cycle 26:
	1746	+ Building formula for cycle 27: vars 10939700 clauses 34732235
	1747	+ Checking cycle 27:
	1748	+ Building formula for cycle 28: vars 11268148 clauses 35780811
	1749	+ Checking cycle 28:
	1750	+ Building formula for cycle 29: vars 11268148 clauses 35780811
	1751	+ Checking cycle 29:
	1752	+ Building formula for cycle 30: vars 11268148 clauses 35780811
	1753	+ Checking cycle 30:
	1754	+ Building formula for cycle 31: vars 11376634 clauses 36081039
	1755	+ Checking cycle 31:
	1756	+ Building formula for cycle 32: vars 11726054 clauses 37134853
	1757	+ Checking cycle 32:
	1758	+ Building formula for cycle 33: vars 12116452 clauses 38307937
	1759	+ Checking cycle 33:
	1760	+ Building formula for cycle 34: vars 15258342 clauses 48819099
	1761	+ Checking cycle 34:
	1762	+ Building formula for cycle 35: vars 15586534 clauses 49867675
	1763	+ Checking cycle 35:
	1764	+ Building formula for cycle 36: vars 15619430 clauses 49965979
	1765	+ Checking cycle 36:
	1766	+ Finished in 3948.52
	1767	+ The execution is secure
	1768	+ ```
	1769	+
	1770	+Notes:
	1771	+
	1772	+* This analysis exercises the full data path of the GHASH block and comprises the following operations (controlled by a small [Verilator testbench](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/alma/cpp/verilator_tb_aes_ghash_wrap.cpp)):
	1773	+ + Initial clearing of all internal registers.
	1774	+ + Loading the hash subkey H.
	1775	+ + Loading the encrypted initial counter block S including the subsequent generation of repeatedly used correction terms.
	1776	+ + Processing a first AAD/ciphertext block including the generation of a correction term that is used for the first block only.
	1777	+ + Processing a second AAD/ciphertext block.
	1778	+ + Producing the final authentication tag.
	1779	+
	1780	+* The [following main changes have been implemented as a result of the formal netlist analysis using Alma](https://github.com/vogelpi/opentitan/commit/ac9333116cbe65fa6b868fe02cb17344d1e2717f) (refer to the [countermeasure spec](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#mapping-the-masked-algorithm-to-the-hardware) for details):
	1781	+ + The result of the final addition of Share 1 of S and the unmasked GHASH state is no longer stored into the GHASH state register but directly forwarded to the output, and the state input to this addition is blanked.
	1782	+ The input multiplexer (`ghash_in_mux`) loses one input.
	1783	+ + The two 3-input multiplexers selecting the operands for the addition with the GHASH state (`add_in_mux`) are replaced by one-hot multiplexers with registered control signals.
	1784	+ + The Operand B inputs of both GF multipliers are now blanked.
	1785	+ The 3-input multiplexer selecting Operand B of the second GF multiplier is replaced by a one-hot multiplexer with registered control signal.
	1786	+ In addition, the last input slice of Operand B for this multiplier is registered.
	1787	+ This allows the switching the multiplexer during the last clock cycle of the multiplication to avoid some undesirable transient leakage occurring upon saving the result of the multiplication into the GHASH state register (and this new value propagating through the multiplexer into the multiplier again).
	1788	+ + The GF multipliers are configured to output zero instead of Operand A (the hash subkey) while busy.
	1789	+ + The state input for the addition required for the generation of the correction term for Share 0 is blanked.
	1790	+ + Between adding the correction terms to the GHASH state for the last time and between unmasking the GHASH state, a bubble cycle is added to allow signals to fully settle thereby avoiding undesirable transient effects unmasking the uncorrected state shares.
	1791	+* The overall area impact of these changes is low (+0.16 kGE in Yosys + nangate45).
	1792	+* The final design successfully passes the formal masking verification.
	1793	+ For details regarding tool parameters, check the [analysis script](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/alma/verify_aes_ghash.sh).
	1794	+
	1795	+#### ChipWhisperer-based FPGA evaluation and TVLA
	1796	+
	1797	+To underpin the results of the formal verification flow, the hardening of the GHASH module has been analyzed on the ChipWhisperer [CW310](https://rtfm.newae.com/Targets/CW310%20Bergen%20Board/) FPGA board.
	1798	+For this analysis, power traces with the ChipWhisperer [Husky](https://rtfm.newae.com/Capture/ChipWhisperer-Husky/) scope were captured during GCM operations.
	1799	+Afterwards a Test Vector Leakage Assessment (TVLA) with the [ot-sca toolset](https://github.com/lowRISC/ot-sca) has been performed.
	1800	+The setup is illustrated in Figure 1.
	1801	+
	1802	+![](../images/caliptra-rtl/docs/images/cw310_cwhusky.jpeg)
	1803	+:--:
	1804	+Figure 1: Target CW310 FPGA board (left) and the CW Husky scope (right).
	1805	+
	1806	+##### Setup
	1807	+
	1808	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure2.png)
	1809	+:--:
	1810	+Figure 2: Measurement setup. The main components are the target board, the scope, and the SCA framework.
	1811	+
	1812	+Figure 2 gives a detailed overview of the measurement setup that has been utilized to capture the power traces.
	1813	+The SCA evaluation framework ot-sca is the central component of the measurement setup.
	1814	+It is responsible for communicating with the penetration testing framework that runs on the target FPGA board and with the scope.
	1815	+Initially, ot-sca configures the scope (sample rate, number of samples) and the pentest framework (which input, how many encryptions, where to trigger).
	1816	+
	1817	+Based on the configuration, the pentest framework generates the cipher input, starts the encryption, and sends back the computed tag to ot-sca.
	1818	+The trigger is automatically set and unset by the AES hardware block to achieve an accurate & constant trigger window.
	1819	+In parallel, the scope waits for the trigger, captures the power consumption, and transfers the traces to the SCA evaluation framework.
	1820	+The ot-sca framework stores the trace as well as the cipher configuration in a database.
	1821	+
	1822	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure3.png)
	1823	+:--:
	1824	+Figure 3: Power trace with AES encryption rounds visible (left). Aligned traces when zooming in (right).
	1825	+
	1826	+Figure 3 depicts power traces captured during AES-GCM encryptions with the setup above.
	1827	+As shown in the figure, the traces are nicely aligned, allowing to perform a sound evaluation.
	1828	+
	1829	+##### Methodology
	1830	+
	1831	+To detect whether the hardened GHASH implementation effectively mitigates SCA attacks, the Test Vector Leakage Assessment (TVLA) approach discussed by Rambus in a [whitepaper](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf) is adapted for the GCM mode of AES.
	1832	+In TVLA, Welch’s t-test is used to determine whether it is possible to statistically distinguish two power trace sets from each other.
	1833	+This test returns a value t for each sample, where a value of \|t\| > 4.5 means that, with a high probability, a data dependent leakage was detected.
	1834	+However, note that this test cannot provide any information whether the leakage is actually exploitable.
	1835	+
	1836	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure4.png)
	1837	+:--
	1838	+Figure 4: TVLA plot showing leakage at around sample 1000. When increasing the number of traces (from 1000 to 10000), the leakage becomes more present. Note that the traces shown in this plot are taken from an arbitrary cryptographic hardware block and not AES.
	1839	+
	1840	+Figure 4 shows a TVLA plot that will be used throughout this document. The red lines mark the ± t-test border.
	1841	+
	1842	+###### Dataset Generation for FvsR IV & Key
	1843	+
	1844	+In TVLA, two different trace data sets need to be recorded.
	1845	+As described in the [whitepaper](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf), we generate these two trace data sets by using a fixed and a random AES-GCM cipher input set, i.e., the fixed and the random set.
	1846	+
	1847	+\| Input \| Fixed Set \| Random Set \|
	1848	+\| --- \| --- \| --- \|
	1849	+\| Key \| STATIC \| RANDOM \|
	1850	+\| IV \| STATIC \| RANDOM \|
	1851	+\| PTX \| STATIC \| STATIC \|
	1852	+\| AAD \| STATIC \| STATIC \|
	1853	+
	1854	+
	1855	+As shown in the table above, for our experiment we use a static cipher input for the fixed set.
	1856	+For the random set, we use a PRNG to randomly generate the secrets, i.e., key and IV, for each encryption.
	1857	+The dataset is generated directly on the device in the pentest framework.
	1858	+For each trace, ot-sca stores information to which dataset the trace belongs to.
	1859	+
	1860	+With TVLA, the idea is to check whether we are able to distinguish power traces from the fixed and the random set.
	1861	+
	1862	+###### Dataset Generation for FvsR PTX & AAD
	1863	+
	1864	+For the second experiment, we use a static IV and key and calculate a FvsR PTX and AAD set:
	1865	+
	1866	+\| Input \| Fixed Set \| Random Set \|
	1867	+\| --- \| --- \| --- \|
	1868	+\| Key \| STATIC \| STATIC \|
	1869	+\| IV \| STATIC \| STATIC \|
	1870	+\| PTX \| STATIC \| RANDOM \|
	1871	+\| AAD \| STATIC \| RANDOM \|
	1872	+
	1873	+
	1874	+##### Results – FvsR IV & Key
	1875	+
	1876	+In the following, we discuss the analysis results for each GCM phase.
	1877	+We start with the results for the FvsR IV & Key datasets.
	1878	+
	1879	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure5.png)
	1880	+:--:
	1881	+Figure 5: AES-GCM block diagram. Red lines mark the trigger windows for each analysis step.
	1882	+
	1883	+As shown in Figure 5, we focus on analyzing (i) the generation of the hash subkey H, (ii) the encryption of the initial counter block S, (iii) the processing of the AAD blocks, (iv) the plaintext blocks, and (v) the tag generation. Each measurement is conducted with (a) masks off and (b) masks on to analyze the effectiveness of the masking countermeasure.
	1884	+
	1885	+###### i) SCA Evaluation of Generating the Hash Subkey H
	1886	+
	1887	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure6ab.png)
	1888	+:--:
	1889	+
	1890	+\| Figure 6a: Masking Off - 100k traces - Figure 6b: Masking On - 1M traces \|
	1891	+
	1892	+
	1893	+###### Interpretation
	1894	+
	1895	+The AES encryption is clearly visible in the form of 12 distinct peaks in the power traces shown Figures 6a and 6b.
	1896	+The 12 peaks correspond to first the loading of the key and the all-zero block into the AES cipher core, followed by the initial round and the 10 full AES rounds (AES-128).
	1897	+They spread over approximately 470 samples which corresponds to the 56 target clock cycles a full AES-128 encryption takes.
	1898	+
	1899	+If the masking is turned off (Figure 6a), first and second-order leakage is clearly visible throughout the operation.
	1900	+If the masking is on (Figure 6b), there is first-order leakage 1) at the beginning as well as 2) at the end of the operation.
	1901	+
	1902	+1. The leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
	1903	+ This produces first-order leakage as the inc32 function implementation isn’t masked.
	1904	+ It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM.
	1905	+2. The leakage at the end of the operation happens when the masked output of the AES cipher core, i.e., the masked hash subkey H, gets loaded in shares into the GHASH block.
	1906	+ When studying the RTL, one can see that there is nothing in the path between the AES cipher core and the hash subkey registers inside the GHASH block that could combine the shares and cause this leakage.
	1907	+ The leakage is most likely due to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices: if flip flops of the different shares get mapped to the same logic slice, the carry-chain and other muxing logic present in the logic slice can combine the various inputs thereby causing SCA leakage despite these logic outputs not being used.
	1908	+ We’ve observed similar effects in the past and there is [research giving more insight into this and other FPGA-specific issues](https://ieeexplore.ieee.org/document/10545383).
	1909	+
	1910	+To summarize, the observed first-order leakage if masking is on (Figure 6b) is not of concern for ASIC implementations.
	1911	+
	1912	+###### ii) SCA Evaluation of Encrypting the Initial Counter Block
	1913	+
	1914	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure7ab.png)
	1915	+:--:
	1916	+
	1917	+\| Figure 7a: Masking Off - 100k traces - Figure 7b: Masking On - 1M traces \|
	1918	+
	1919	+
	1920	+###### Interpretation
	1921	+
	1922	+Again, the AES encryption is clearly visible in the form of 12 peaks in the power traces shown Figures 7a and 7b.
	1923	+This AES encryption corresponds to the generation of the encrypted initial counter block S.
	1924	+The AES encryption is followed by another operation visible in the power trace: the computation of repeatedly used correction terms using the Galois-field multipliers inside GHASH.
	1925	+This operation takes 33 target clock cycles (approximately 275 samples).
	1926	+
	1927	+If the masking is turned off (Figure 7a), first and second-order leakage is clearly visible throughout both operations while being more pronounced during the GHASH operation.
	1928	+This is because the GHASH block is smaller and thus produces less noise.
	1929	+If the masking is on (Figure 7b), there is first-order leakage 1) at the beginning as well as 2) between the two operations.
	1930	+
	1931	+1. As before, the leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
	1932	+ This produces first-order leakage as the inc32 function implementation isn’t masked.
	1933	+ It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM.
	1934	+2. As before, the leakage at the end of the operation happens when the masked output of the AES cipher core, i.e., the encrypted initial counter block gets loaded in shares into the GHASH block.
	1935	+ When studying the RTL, one can see that there is nothing in the path between the AES cipher core and the GHASH state registers inside the GHASH block that could combine the shares and cause this leakage.
	1936	+ As before, the leakage is most likely due to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices: Since the multiplexers for both shares use the same control signals, the multiplexing logic can be combined even into the same look-up tables (LUTs) thereby causing SCA leakage.
	1937	+ We’ve observed similar effects in the past and there is [research giving more insight into this and other FPGA-specific issues](https://ieeexplore.ieee.org/document/10545383).
	1938	+
	1939	+To summarize, the observed first-order leakage if masking is on (FIgure 7b) is not of concern for ASIC implementations.
	1940	+
	1941	+###### iii) SCA Evaluation of Processing the AAD Blocks
	1942	+
	1943	+###### Processing AAD Block 0
	1944	+
	1945	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure8ab.png)
	1946	+:--:
	1947	+
	1948	+\| Figure 8a: Masking Off - 50k traces - Figure 8b: Masking On - 10M traces \|
	1949	+
	1950	+
	1951	+###### Interpretation
	1952	+
	1953	+For AAD blocks, the AES cipher core is not involved.
	1954	+However, during the computation of the first AAD block, the GHASH block needs to compute an additional correction term which is used for the very first block only.
	1955	+If the masking is turned off (Figure 8a), first- and second-order leakage is clearly visible but only for the first activity block.
	1956	+The second activity block involves computing the additional correction terms which requires Share 1 of the encrypted initial counter block to be multiplied by Share 1 of the hash subkey.
	1957	+But since the masking is off, both these values are zero for both the fixed and the random set and hence there is no SCA leakage.
	1958	+If the masking is turned on (Figure 8b), no SCA leakage is observable which is desirable.
	1959	+
	1960	+###### Processing AAD Block 1
	1961	+
	1962	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure9ab.png)
	1963	+:--:
	1964	+
	1965	+\| Figure 9a: Masking Off - 50k traces - Figure 9b: Masking On - 10M traces \|
	1966	+
	1967	+
	1968	+###### Interpretation
	1969	+
	1970	+For the second AAD block (and any subsequent AAD blocks) there is only one activity block corresponding to the Galois-field multiplication.
	1971	+If masking is turned off (Figure 9a), there is both first- and second-order leakage observable.
	1972	+If the masking is turned on (Figure 9b), no SCA leakage is observable which is desirable.
	1973	+
	1974	+###### iv) SCA Evaluation of Processing the PTX Blocks
	1975	+
	1976	+###### Processing PTX Block 0
	1977	+
	1978	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure10ab.png)
	1979	+:--:
	1980	+
	1981	+\| Figure 10a: Masking Off - 50k traces - Figure 10b: Masking On - 1M traces \|
	1982	+
	1983	+
	1984	+###### Interpretation
	1985	+
	1986	+Like in [ii) SCA Evaluation of Encrypting the Initial Counter Block](#ii-sca-evaluation-of-encrypting-the-initial-counter-block) there is first-order leakage 1) at the beginning and 2) between the two operations if the masking is turned on (Figure 10b).
	1987	+
	1988	+1. As before, the leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
	1989	+ This produces first-order leakage as the inc32 function implementation isn’t masked.
	1990	+ It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM.
	1991	+2. The leakage between the two operations is due to the unmasking of the AES cipher core output, the addition of input data to produce the ciphertext, and writing this value to the GHASH block and the output data registers.
	1992	+ It’s not related to the hash subkey H or the initial counter block S (i.e. the two secrets involved in the GHASH part of GCM).
	1993	+ But since the AAD and the plaintext have been chosen to be the same for all traces in the fixed and the random sets, the traces of the fixed set only produce all the same ciphertext and thus are expected to exhibit a static power signature for this step, whereas the ciphertext of the random set is randomized through the random key and IV.
	1994	+ However, since the ciphertext is not secret in the context of GCM, this leakage is of no concern.
	1995	+
	1996	+To summarize, the observed first-order leakage if masking is on (FIgure 10b) is not of concern.
	1997	+
	1998	+###### Processing PTX Block 1
	1999	+
	2000	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure11ab.png)
	2001	+:--:
	2002	+
	2003	+\| Figure 11a: Masking Off - 50k traces - Figure 11b: Masking On - 1M traces \|
	2004	+
	2005	+
	2006	+###### Interpretation
	2007	+
	2008	+As before (PTX Block 0), there is some first-order leakage observable when the masking is turned on.
	2009	+For the same reasons as before, this leakage is not of concern.
	2010	+
	2011	+###### v) SCA Evaluation of the Tag Generation
	2012	+
	2013	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure12ab.png)
	2014	+:--:
	2015	+
	2016	+\| Figure 12a: Masking Off - 50k traces - Figure 12b: Masking On - 1M traces \|
	2017	+
	2018	+
	2019	+###### Interpretation
	2020	+
	2021	+The generation of the final authentication tag consists of two operations.
	2022	+1) The 128-bit block containing the AAD and ciphertext lengths is hashed and the correction terms are added.
	2023	+ The GHASH state is unmasked (still masked with the encrypted initial counter block S) and Share 1 of S is added to write the final authentication tag to the data output registers readable by software.
	2024	+2) In parallel to writing the final authentication tag to the data output registers, the internal state is all cleared to random values and an additional multiplication is triggered to clear the internal state of the Galois-field multipliers and the correction term registers.
	2025	+
	2026	+If masking is turned off (Figure 12a), there is both first- and second-order leakage observable during the first activity block (tag generation) but not during the clearing operation.
	2027	+If the masking is turned on (Figure 12b), some SCA leakage is observable between the two operations, i.e., when the final authentication tag is written to the output data registers.
	2028	+This leakage is expected as both the fixed and the random data sets use a static AAD and plaintext.
	2029	+This means, the tag for the fixed data set is fixed whereas the tags for the random set get randomized through the ciphertext (random due to the random key and IV).
	2030	+
	2031	+To summarize, the observed first-order leakage if masking is on (FIgure 12b) is not of concern.
	2032	+
	2033	+##### Results – FvsR PTX & AAD
	2034	+
	2035	+In the following, we discuss the analysis results for each FvsR PTX & AAD datasets.
	2036	+These experiments were specifically done to investigate leakage peaks identified for the FvsR Key & IV datasets that are attributed to how the FPGA implementation tool maps flip flops and multiplexer shares to the available FPGA logic slices.
	2037	+
	2038	+###### i) SCA Evaluation of Generating the Hash Subkey H
	2039	+
	2040	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure13ab.png)
	2041	+:--:
	2042	+
	2043	+\| Figure 13a: Masking Off - 50k traces - Figure 13b: Masking On - 1M traces \|
	2044	+
	2045	+
	2046	+###### Interpretation
	2047	+
	2048	+There is no SCA leakage visible in both cases without masking (Figure 13a) and with masking turned on (Figure 13b).
	2049	+This is expected as the hash subkey generation doesn’t involve the plaintext and the AAD but only the key and IV.
	2050	+Both the fixed and random set use the same static key and IV.
	2051	+
	2052	+This experiment was specifically done to check whether the leakage identified in Figure 6b and attributed to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices.
	2053	+As expected, the leakage peak is now gone.
	2054	+
	2055	+###### ii) SCA Evaluation of Encrypting the Initial Counter Block
	2056	+
	2057	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure14ab.png)
	2058	+:--:
	2059	+
	2060	+\| Figure 14a: Masking Off - 50k traces - Figure 14b: Masking On - 1M traces \|
	2061	+
	2062	+
	2063	+###### Interpretation
	2064	+
	2065	+There is no SCA leakage visible in both cases without masking (Figure 14a) and with masking turned on (Figure 14b).
	2066	+This is expected as the encryption of the initial counter block and the subsequent computation of repeatedly used correction terms doesn’t involve the plaintext and the AAD but only the key and IV.
	2067	+Both the fixed and random set use the same static key and IV.
	2068	+
	2069	+This experiment was specifically done to check whether the leakage identified in Figure 7b and attributed to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices.
	2070	+As expected, the leakage peak is now gone.
	2071	+
	2072	+###### iv) SCA Evaluation of Processing the PTX Block 0
	2073	+
	2074	+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure15ab.png)
	2075	+:--:
	2076	+
	2077	+\| Figure 15a: Masking Off - 100k traces - Figure 15b: Masking On - 1M traces \|
	2078	+
	2079	+
	2080	+###### Interpretation
	2081	+
	2082	+With the masking turned off (Figure 15a), there is first-order leakage 1) at the beginning of the operation and 2) throughout the entire GHASH operation.
	2083	+
	2084	+1. The leakage at the beginning of the operation is due to the input data (the plaintext) being written to an internal buffer register.
	2085	+ The AES cipher is operated in counter mode, meaning it doesn’t encrypt the input data but the counter value (incremented IV).
	2086	+ Because the IV is fixed for both the fixed and the random data set, no leakage is observed during the AES encryption even if the masking is off.
	2087	+ At the end of the AES encryption, the output of the AES cipher core is added to the content of the buffer register to produce the ciphertext which is then forwarded to the GHASH block and to the data output registers.
	2088	+2. The GHASH operation then processes this ciphertext.
	2089	+ The observed leakage when the masking is off is expected.
	2090	+
	2091	+With the masking turned on (Figure 15b), the first-order leakage at the beginning of the operation remains visible. The reason for this is that the internal register buffering the previous input data is not masked.
	2092	+This is of no concern as the leakage is not related to key or IV.
	2093	+
	2094	+Another first-order leakage peak is visible between the AES encryption and the GHASH operation.
	2095	+This leakage is due to the unmasked AES cipher core output being added to the input data (coming from the internal buffer register) and the result being stored to the output data register.
	2096	+As key and IV are static and identical for both the fixed and the random data set, the cipher core output is the same for both sets.
	2097	+Any difference in the power signature between the two sets is due to the different plaintext / ciphertext.
	2098	+Again, this is to be expected and of no concern as the ciphertext is not secret in the context of GCM.
	2099	+
	2100	+#### Reproducing the FPGA Experiments
	2101	+
	2102	+##### Prerequisites
	2103	+
	2104	+###### (i) Setting up the CW310 and CW Husky
	2105	+
	2106	+Please follow the guide [here](https://github.com/lowRISC/ot-sca/blob/master/doc/getting_started.md#cw310) to prepare the CW310 and CW Husky for the SCA measurements.
	2107	+
	2108	+###### (ii) Generating the FPGA Bitstream
	2109	+
	2110	+Follow the guide [here](https://opentitan.org/book/doc/getting_started/install_vivado/index.html) to install Xilinx Vivado. Please note that a valid license is needed to generate bitstreams for the CW310 FPGA board.
	2111	+
	2112	+Then, build the bitstream from the [aes-gcm-sca-bitstream](https://github.com/vogelpi/opentitan/tree/aes-gcm-sca-bitstream) branch.
	2113	+This branch includes the AES-GCM and applies several optimizations (disabling certain features to reduce the area utilization) to improve the SCA measurements.
	2114	+```sh
	2115	+git clone https://github.com/vogelpi/opentitan.git
	2116	+cd opentitan
	2117	+git checkout aes-gcm-sca-bitstream
	2118	+./bazelisk.sh build //hw/bitstream/vivado:fpga_cw310_test_rom
	2119	+cp bazel-bin/hw/bitstream/vivado/build.fpga_cw310/synth-vivado/lowrisc_systems_chip_earlgrey_cw310_0.1.bit .
	2120	+```
	2121	+
	2122	+The resulting bitstream is `lowrisc_systems_chip_earlgrey_cw310_0.1.bit`.
	2123	+
	2124	+###### (iii) Compiling the Penetration Testing Binary
	2125	+
	2126	+The penetration testing binary that is running on the target is the framework that receives commands from the side-channel evaluation framework and triggers the AES-GCM operations.
	2127	+```sh
	2128	+git clone <https://github.com/vogelpi/opentitan.git>
	2129	+cd opentitan
	2130	+git checkout aes-gcm-review
	2131	+./bazelisk.sh build //sw/device/tests/penetrationtests/firmware:firmware_fpga_cw310_test_rom
	2132	+cp bazel-bin/sw/device/tests/penetrationtests/firmware/firmware_fpga_cw310_test_rom_fpga_cw310_test_rom.bin sca_ujson_fpga_cw310.bin
	2133	+```
	2134	+
	2135	+The resulting penetration testing binary is `sca_ujson_fpga_cw310.bin`.
	2136	+
	2137	+###### (iv) Setting up the Side-Channel Evaluation Framework
	2138	+
	2139	+Clone the ot-sca repository and switch to the dedicated AES-GCM branch:
	2140	+```sh
	2141	+git clone <https://github.com/lowRISC/ot-sca.git>
	2142	+cd ot-sca
	2143	+git checkout ot-sca-aes-gcm
	2144	+```
	2145	+
	2146	+Then, follow [this](https://github.com/lowRISC/ot-sca/blob/master/doc/getting_started.md#installing-on-a-machine) guideline to prepare your machine for the measurements.
	2147	+
	2148	+Afterwards, copy the bitstream to `ot-sca/objs/lowrisc_systems_chip_earlgrey_cw310_0.1.bit` and the binary to `ot-sca/objs/sca_ujson_fpga_cw310.bin`.
	2149	+
	2150	+Finally, determine the port the CW310 opened on your machine (e.g., `/dev/ttyACM2`) and set it accordingly in the `port` field of the `ot-sca/capture/configs/aes_gcm_sca_cw310.yaml` configuration file.
	2151	+
	2152	+##### Capturing Traces
	2153	+
	2154	+After fulfilling the prerequisites, traces can be captured using ot-sca.
	2155	+To configure the measurement, adapt the script located in `ot-sca/capture/configs/aes_gcm_sca_cw310.yaml`.
	2156	+The following parameters can be changed:
	2157	+```yml
	2158	+husky:
	2159	+ # Number of encryptions performed in one batch.
	2160	+ num_segments: 35
	2161	+ # Number of cycles that are captured by the CW Husky.
	2162	+ num_cycles: 320
	2163	+capture:
	2164	+ # Number of traces to capture.
	2165	+ num_traces: 100000
	2166	+ # Number of traces to keep in memory before flushing to the disk.
	2167	+ trace_threshold: 50000
	2168	+test:
	2169	+ # Values used for the fixed set.
	2170	+ iv_fixed: [0xDE, 0xAD, 0xBE, 0xEF, 0xCA, 0xFE, 0xBA, 0xAD, 0xF0, 0xCA,
	2171	+ 0xCC, 0x1A, 0x00, 0x00, 0x00, 0x00]
	2172	+ key_fixed: [0x81, 0x1E, 0x37, 0x31, 0xB0, 0x12, 0x0A, 0x78, 0x42, 0x78,
	2173	+ 0x1E, 0x22, 0xB2, 0x5C, 0xDD, 0xF9]
	2174	+ # Static values that are used by the fixed and the random set.
	2175	+ ptx_blocks: 2
	2176	+ ptx_static: [[0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA,
	2177	+ 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA], [0xBB, 0xBB, 0xBB,
	2178	+ 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
	2179	+ 0xBB, 0xBB, 0xBB]]
	2180	+ ptx_last_block_len_bytes: 16
	2181	+ aad_blocks: 2
	2182	+ aad_static: [[0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC,
	2183	+ 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC], [0xDD, 0xDD, 0xDD,
	2184	+ 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD,
	2185	+ 0xDD, 0xDD, 0xDD, 0xDD]]
	2186	+ aad_last_block_len_bytes: 16
	2187	+ # Trigger configuration (select only one).
	2188	+ # [Hash sub key, Init. block, AAD block, PTX block, TAG block]
	2189	+ triggers: [False, False, False, False, True]
	2190	+ # Which AAD or PTX block. 0 = first block.
	2191	+ trigger_block: 0
	2192	+ # 32-bit seed for masking on device. To switch off the masking, use 0
	2193	+ # as an LFSR seed.
	2194	+ lfsr_seed: 0x00000000
	2195	+ #lfsr_seed: 0xdeadbeef
	2196	+```
	2197	+
	2198	+After tweaking the configuration, the traces can be captured by executing:
	2199	+
	2200	+```sh
	2201	+cd capture
	2202	+./capture_aes_gcm.py -c configs/aes_gcm_sca_cw310.yaml -p aes_gcm_sca
	2203	+```
	2204	+
	2205	+Where the `-c` parameter is the config and `-p` the database where the traces are stored.
	2206	+
	2207	+##### Performing the TVLA
	2208	+
	2209	+After capturing the traces, the TVLA can be performed by switching into the `ot-sca/analysis` folder, copying the `ot-sca/analysis/configs/tvla_cfg_kmac.yaml` file to `ot-sca/analysis/configs/tvla_cfg_aes_gcm.yaml`, and modifying the configuration file:
	2210	+```yml
	2211	+project_file: ../capture/projects/aes_gcm_sca
	2212	+trace_file: null
	2213	+trace_start: null
	2214	+trace_end: null
	2215	+leakage_file: null
	2216	+save_to_disk: null
	2217	+save_to_disk_ttest: null
	2218	+round_select: null
	2219	+byte_select: null
	2220	+input_histogram_file: null
	2221	+output_histogram_file: null
	2222	+number_of_steps: 1
	2223	+ttest_step_file: null
	2224	+plot_figures: true
	2225	+test_type: "GENERAL_KEY"
	2226	+mode: aes
	2227	+filter_traces: true
	2228	+trace_threshold: 50000
	2229	+trace_db: ot_trace_library
	2230	+```
	2231	+
	2232	+By calling
	2233	+```sh
	2234	+./tvla.py --cfg-file tvla_cfg_aes_gcm.yaml run-tvla
	2235	+```
	2236	+the TVLA plot is generated.
	2237	+
1459	2238	## PCR vault
1460	2239
1461	2240	* Platform Configuration Register (PCR) vault is a register file that stores measurements to be used by the microcontroller.
1462		-* PCR entries are read-only registers of 384 bits each.
	2241	+* PCR entries are read-only registers of 512 bits each.
1463	2242	* Control bits allow for entries to be cleared by FW, which sets their values back to 0.
1464	2243	* A lock bit can be set by FW to prevent the entry from being cleared. The lock bit is sticky and only resets on a powergood cycle.
1465	2244
@@ -1490,23 +2269,23 @@
1490	2269
1491	2270	## Key vault
1492	2271
1493		-Key Vault (KV) is a register file that stores the keys to be used by the microcontroller, but this register file is not observed by the microcontroller. Each cryptographic function has a control register and functional block designed to read from and write to the KV.
	2272	+Key Vault (KV) is a register file that stores the keys to be used by the microcontroller, but this register file is not observed by the microcontroller. Each cryptographic function has a control register and functional block designed to read from and write to the KV.
1494	2273
1495	2274	\| KV register \| Description \|
1496	2275	\| :-------------------------------- \| :-------------------------------------------------------- \|
1497		-\| Key Control\[31:0\] \| 32 Control registers, 32 bits each \|
1498		-\| Key Entry\[31:0\]\[11:0\]\[31:0\] \| 32 Key entries, 384 bits each <br>No read or write access \|
	2276	+\| Key Control\[23:0\] \| 24 Control registers, 32 bits each \|
	2277	+\| Key Entry\[23:0\]\[15:0\]\[31:0\] \| 24 Key entries, 512 bits each <br>No read or write access \|
1499	2278
1500	2279
1501	2280	### Key vault functional block
1502	2281
1503		-Keys and measurements are stored in 512b register files. These have no read or write path from the microcontroller. The entries are read through a passive read mux driven by each cryptographic block. Locked entries return zeroes.
1504		-
1505		-Entries in the KV must be cleared via control register, or by de-assertion of pwrgood.
1506		-
1507		-Each entry has a control register that is writable by the microcontroller.
1508		-
1509		-The destination valid field is programmed by FW in the cryptographic block generating the key, and it is passed here at generation time. This field cannot be modified after the key is generated and stored in the KV.
	2282	+Keys and measurements are stored in 512b register files. These have no read or write path from the microcontroller. The entries are read through a passive read mux driven by each cryptographic block. Locked entries return zeroes.
	2283	+
	2284	+Entries in the KV must be cleared via control register, or by de-assertion of pwrgood.
	2285	+
	2286	+Each entry has a control register that is writable by the microcontroller.
	2287	+
	2288	+The destination valid field is programmed by FW in the cryptographic block generating the key, and it is passed here at generation time. This field cannot be modified after the key is generated and stored in the KV.
1510	2289
1511	2290	\| KV Entry Ctrl Fields \| Reset \| Description \|
1512	2291	\| --------------------------- \| ------------------- \| ------------------------ \|
@@ -1515,11 +2294,11 @@
1515	2294	\| Clear\[2\] \| cptra_rst_b \| If unlocked, setting the clear bit causes KV to clear the associated entry. The clear bit is reset after entry is cleared. \|
1516	2295	\| Copy\[3\] \| cptra_rst_b \| ENHANCEMENT: Setting the copy bit causes KV to copy the key to the entry written to Copy Dest field. \|
1517	2296	\| Copy Dest\[8:4\] \| cptra_rst_b \| ENHANCEMENT: Destination entry for the copy function. \|
1518		-\| Dest_valid\[16:9\] \| hard_reset_b \| KV entry can be used with the associated cryptographic block if the appropriate index is set. <br>\[0\] - HMAC KEY <br>\[1\] - HMAC BLOCK <br>\[2\] - SHA BLOCK <br>\[2\] - ECC PRIVKEY <br>\[3\] - ECC SEED <br>\[7:5\] - RSVD \|
	2297	+\| Dest_valid\[16:9\] \| hard_reset_b \| KV entry can be used with the associated cryptographic block if the appropriate index is set. <br>\[0\] - HMAC KEY <br>\[1\] - HMAC BLOCK <br>\[2\] - MLDSA SEED <br>\[3\] - ECC PRIVKEY <br>\[4\] - ECC SEED <br>\[5\] - AES KEY <br>\[7:6\] - RSVD \|
1519	2298	\| last_dword\[20:19\] \| hard_reset_b \| Store the offset of the last valid dword, used to indicate the last cycle for read operations. \|
1520	2299
1521	2300
1522		-### Key vault cryptographic functional block
	2301	+### Key vault cryptographic functional block
1523	2302
1524	2303	A generic block is instantiated in each cryptographic block to enable access to KV.
1525	2304
@@ -1551,10 +2330,11 @@
1551	2330	\| write_entry\[5:1\] \| Key vault entry to store the result. \|
1552	2331	\| hmac_key_dest_valid\[6\] \| HMAC KEY is a valid destination. \|
1553	2332	\| hmac_block_dest_valid\[7\] \| HMAC BLOCK is a valid destination. \|
1554		-\| sha_block_dest_valid\[8\] \| SHA BLOCK is a valid destination. \|
	2333	+\| mldsa_seed_dest_valid\[8\] \| MLDSA SEED is a valid destination. \|
1555	2334	\| ecc_pkey_dest_valid\[9\] \| ECC PKEY is a valid destination. \|
1556	2335	\| ecc_seed_dest_valid\[10\] \| ECC SEED is a valid destination. \|
1557		-\| rsvd\[31:11\] \| Reserved field \|
	2336	+\| aes_key_dest_valid\[11\] \| AES KEY is a valid destination. \|
	2337	+\| rsvd\[31:12\] \| Reserved field \|
1558	2338
1559	2339
1560	2340	\| KV Status Reg \| Description \|
@@ -1583,12 +2363,12 @@
1583	2363
1584	2364	### Key vault de-obfuscation block operation
1585	2365
1586		-A de-obfuscation engine (DOE) is used in conjunction with AES cryptography to de-obfuscate the UDS and field entropy.
1587		-
1588		-1. The obfuscation key is driven to the AES key. The data to be decrypted (either obfuscated UDS or obfuscated field entropy) is fed into the AES data.
1589		-2. An FSM manually drives the AES engine and writes the decrypted data back to the key vault.
1590		-3. FW programs the DOE with the requested function (UDS or field entropy de-obfuscation), and the destination for the result.
1591		-4. After de-obfuscation is complete, FW can clear out the UDS and field entropy values from any flops until cptra\_pwrgood de-assertion.
	2366	+A de-obfuscation engine (DOE) is used in conjunction with AES cryptography to de-obfuscate the UDS and field entropy.
	2367	+
	2368	+1. The obfuscation key is driven to the AES key. The data to be decrypted (either obfuscated UDS or obfuscated field entropy) is fed into the AES data.
	2369	+2. An FSM manually drives the AES engine and writes the decrypted data back to the key vault.
	2370	+3. FW programs the DOE with the requested function (UDS or field entropy de-obfuscation), and the destination for the result.
	2371	+4. After de-obfuscation is complete, FW can clear out the UDS and field entropy values from any flops until cptra\_pwrgood de-assertion.
1592	2372
1593	2373	The following tables describe DOE register and control fields.
1594	2374
@@ -1605,13 +2385,13 @@
1605	2385	\| DEST\[4:2\] \| Cptra_rst_b \| Destination register for the result of the de-obfuscation flow. Field entropy writes into DEST and DEST+1 <br>Key entry only, can’t go to PCR . \|
1606	2386
1607	2387
1608		-### Key vault de-obfuscation flow
1609		-
1610		-1. ROM loads IV into DOE. ROM writes to the DOE control register the destination for the de-obfuscated result and sets the appropriate bit to run UDS and/or the field entropy flow.
1611		-2. DOE state machine takes over and loads the Caliptra obfuscation key into the key register.
1612		-3. Next, either the obfuscated UDS or field entropy are loaded into the block register 4 DWORDS at a time.
1613		-4. Results are written to the KV entry specified in the DEST field of the DOE control register.
1614		-5. State machine resets the appropriate RUN bit when the de-obfuscated key is written to KV. FW can poll this register to know when the flow is complete.
	2388	+### Key vault de-obfuscation flow
	2389	+
	2390	+1. ROM loads IV into DOE. ROM writes to the DOE control register the destination for the de-obfuscated result and sets the appropriate bit to run UDS and/or the field entropy flow.
	2391	+2. DOE state machine takes over and loads the Caliptra obfuscation key into the key register.
	2392	+3. Next, either the obfuscated UDS or field entropy are loaded into the block register 4 DWORDS at a time.
	2393	+4. Results are written to the KV entry specified in the DEST field of the DOE control register.
	2394	+5. State machine resets the appropriate RUN bit when the de-obfuscated key is written to KV. FW can poll this register to know when the flow is complete.
1615	2395	6. The clear obf secrets command flushes the obfuscation key, the obfuscated UDS, and the field entropy from the internal flops. This should be done by ROM after both de-obfuscation flows are complete.
1616	2396
1617	2397	## Data vault
@@ -1626,7 +2406,7 @@
1626	2406
1627	2407	## Cryptographic blocks fatal and non-fatal errors
1628	2408
1629		-The following table describes cryptographic errors.
	2409	+The following table describes cryptographic errors.
1630	2410
1631	2411	\| Errors \| Error type \| Description \|
1632	2412	\| :----------- \| :----------------- \| :-------------------------------------------------------------------------------------------------------------------------------------------------------- \|
@@ -1654,6 +2434,7 @@
1654	2434	\| DRBG \| Deterministic Random Bit Generator \|
1655	2435	\| DWORD \| 32-bit (4-byte) data element \|
1656	2436	\| ECDSA \| Elliptic Curve Digital Signature Algorithm \|
	2437	+\| ECDH \| Elliptic Curve Deffie-Hellman Key Exchange \|
1657	2438	\| FMC \| FW First Mutable Code \|
1658	2439	\| FSM \| Finite State Machine \|
1659	2440	\| GPU \| Graphics Processing Unit \|
@@ -1693,20 +2474,21 @@
1693	2474
1694	2475	# References
1695	2476
1696		-1. J. Strömbergson, "Secworks," \[Online\]. Available at https://github.com/secworks.
1697		-2. NIST, Federal Information Processing Standards Publication (FIPS PUB) 180-4 Secure Hash Standard (SHS).
1698		-3. OpenSSL \[Online\]. Available at https://www.openssl.org/docs/man3.0/man3/SHA512.html.
1699		-4. N. W. Group, RFC 3394, Advanced Encryption Standard (AES) Key Wrap Algorithm, 2002.
1700		-5. NIST, Federal Information Processing Standards Publication (FIPS) 198-1, The Keyed-Hash Message Authentication Code, 2008.
1701		-6. N. W. Group, RFC 4868, Using HMAC-SHA256, HMAC-SHA384, and HMAC-SHA512 with IPsec, 2007.
1702		-7. RFC 6979, Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA), 2013.
1703		-8. TCG, Hardware Requirements for a Device Identifier Composition Engine, 2018.
1704		-9. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302.
1705		-10. Schindler, W., Wiemers, A.: Efficient side-channel attacks on scalar blinding on elliptic curves with special structure. In: NISTWorkshop on ECC Standards (2015).
1706		-11. National Institute of Standards and Technology, "Digital Signature Standard (DSS)", Federal Information Processing Standards Publication (FIPS PUB) 186-4, July 2013.
	2477	+1. J. Strömbergson, "Secworks," \[Online\]. Available at https://github.com/secworks.
	2478	+2. NIST, Federal Information Processing Standards Publication (FIPS PUB) 180-4 Secure Hash Standard (SHS).
	2479	+3. OpenSSL \[Online\]. Available at https://www.openssl.org/docs/man3.0/man3/SHA512.html.
	2480	+4. N. W. Group, RFC 3394, Advanced Encryption Standard (AES) Key Wrap Algorithm, 2002.
	2481	+5. NIST, Federal Information Processing Standards Publication (FIPS) 198-1, The Keyed-Hash Message Authentication Code, 2008.
	2482	+6. N. W. Group, RFC 4868, Using HMAC-SHA256, HMAC-SHA384, and HMAC-SHA512 with IPsec, 2007.
	2483	+7. RFC 6979, Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA), 2013.
	2484	+8. TCG, Hardware Requirements for a Device Identifier Composition Engine, 2018.
	2485	+9. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302.
	2486	+10. Schindler, W., Wiemers, A.: Efficient side-channel attacks on scalar blinding on elliptic curves with special structure. In: NISTWorkshop on ECC Standards (2015).
	2487	+11. National Institute of Standards and Technology, "Digital Signature Standard (DSS)", Federal Information Processing Standards Publication (FIPS PUB) 186-4, July 2013.
1707	2488	12. NIST SP 800-90A, Rev 1: "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", 2012. \|
1708		-13. CHIPS Alliance, “RISC-V VeeR EL2 Programmer’s Reference Manual” \[Online\] Available at https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf.
1709		-14. “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanovi ́c, RISC-V Foundation, December 2019. Available at https://riscv.org/technical/specifications/.
1710		-15. “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanovi ́c, and John Hauser, RISC-V International, December 2021. Available at https://riscv.org/technical/specifications/.
	2489	+13. CHIPS Alliance, “RISC-V VeeR EL2 Programmer’s Reference Manual” \[Online\] Available at https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf.
	2490	+14. “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanovi ́c, RISC-V Foundation, December 2019. Available at https://riscv.org/technical/specifications/.
	2491	+15. “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanovi ́c, and John Hauser, RISC-V International, December 2021. Available at https://riscv.org/technical/specifications/.
	2492	+16. NIST SP 800-56A, Rev 3: "Recommendation for Pair-Wise Key-Establishment Schemes Using Discrete Logarithm Cryptography", 2018, \|
1711	2493
1712	2494	<sup>[1]</sup> _Caliptra. Spanish for “root cap” and describes the deepest part of the root_

Changes to Hardware Specification

Image Changes

v1.2: CONFIGOPTS.png

v2.0: CONFIGOPTS.png

v1.2: Caliptra_eq_CLKDIV.png

v2.0: Caliptra_eq_CLKDIV.png

v1.2: Caliptra_eq_NCO.png

v2.0: Caliptra_eq_NCO.png

v1.2: Caliptra_eq_SPI_clk_period.png

v2.0: Caliptra_eq_SPI_clk_period.png

v1.2: Caliptra_eq_UART.png

v2.0: Caliptra_eq_UART.png

v1.2: Caliptra_eq_UART2.png

v2.0: Caliptra_eq_UART2.png

v1.2: Crypto-2p0.png

v2.0: Crypto-2p0.png

v1.2: ECC_arch.png

v2.0: ECC_arch.png

v1.2: ECDSA_arch.png

v2.0: ECDSA_arch.png

v1.2: GHASH_TVLA_Figure10ab.png

v2.0: GHASH_TVLA_Figure10ab.png

v1.2: GHASH_TVLA_Figure11ab.png

v2.0: GHASH_TVLA_Figure11ab.png

v1.2: GHASH_TVLA_Figure12ab.png

v2.0: GHASH_TVLA_Figure12ab.png

v1.2: GHASH_TVLA_Figure13ab.png

v2.0: GHASH_TVLA_Figure13ab.png

v1.2: GHASH_TVLA_Figure14ab.png

v2.0: GHASH_TVLA_Figure14ab.png

v1.2: GHASH_TVLA_Figure15ab.png

v2.0: GHASH_TVLA_Figure15ab.png

v1.2: GHASH_TVLA_Figure2.png

v2.0: GHASH_TVLA_Figure2.png

v1.2: GHASH_TVLA_Figure3.png

v2.0: GHASH_TVLA_Figure3.png

v1.2: GHASH_TVLA_Figure4.png

v2.0: GHASH_TVLA_Figure4.png

v1.2: GHASH_TVLA_Figure5.png

v2.0: GHASH_TVLA_Figure5.png

v1.2: GHASH_TVLA_Figure6ab.png

v2.0: GHASH_TVLA_Figure6ab.png

v1.2: GHASH_TVLA_Figure7ab.png

v2.0: GHASH_TVLA_Figure7ab.png

v1.2: GHASH_TVLA_Figure8ab.png

v2.0: GHASH_TVLA_Figure8ab.png

v1.2: GHASH_TVLA_Figure9ab.png

v2.0: GHASH_TVLA_Figure9ab.png

v1.2: HMAC_SHA_512_256.png

v2.0: HMAC_SHA_512_256.png

v1.2: HMAC_pseudo.png

v2.0: HMAC_pseudo.png

v1.2: QSPI_flash.png

v2.0: QSPI_flash.png

v1.2: QSPI_segments.png

v2.0: QSPI_segments.png

v1.2: SPI_read.png

v2.0: SPI_read.png

v1.2: UART_block.png

v2.0: UART_block.png

v1.2: crypto_subsystem.png

v2.0: crypto_subsystem.png

v1.2: serial_transmission.png

v2.0: serial_transmission.png

v1.2: sharedkey_pseudo.png

v2.0: sharedkey_pseudo.png