Changes to Hardware Specification

Comparing version 2.0 to 1.2
+1095 additions -313 deletions
@@ -1,12 +1,12 @@
11 <div style="font-size: 0.85em; color: #656d76; margin-bottom: 1em; padding: 0.5em; background: #f6f8fa; border-radius: 4px;">
2-📄 Source: <a href="https://github.com/chipsalliance/caliptra-rtl/blob/5f85fb4bc95b753a2f7d042db7dc2644ca1e8c49/docs/CaliptraHardwareSpecification.md" target="_blank">chipsalliance/caliptra-rtl/docs/CaliptraHardwareSpecification.md</a> @ <code>5f85fb4</code>
2+📄 Source: <a href="https://github.com/chipsalliance/caliptra-rtl/blob/35b0bc5691b2bd0fc180403914cfabe207379089/docs/CaliptraHardwareSpecification.md" target="_blank">chipsalliance/caliptra-rtl/docs/CaliptraHardwareSpecification.md</a> @ <code>35b0bc5</code>
33 </div>
44
55 ![OCP Logo](../images/caliptra-rtl/docs/images/OCP_logo.png)
66
77 <p style="text-align: center;">Caliptra Hardware Specification</p>
88
9-<p style="text-align: center;">Version 1.1</p>
9+<p style="text-align: center;">Revision 2.0.3</p>
1010
1111 <div style="page-break-after: always"></div>
1212
@@ -21,6 +21,23 @@
2121 # Caliptra Core
2222
2323 For information on the Caliptra Core, see the [High level architecture](https://chipsalliance.github.io/Caliptra/doc/Caliptra.html#high-level-architecture) section of [Caliptra: A Datacenter System on a Chip (SoC) Root of Trust (RoT)](https://chipsalliance.github.io/Caliptra/doc/Caliptra.html).
24+
25+## Key Caliptra Core 2.0 Changes
26+* AXI subordinate replaces APB interface of Caliptra 1.x hardware
27+* SHA Accelerator functionality now available exclusively to Caliptra
28+ * Caliptra uC may use internally in mailbox mode or via the Caliptra AXI DMA assist engine in streaming mode
29+ * SHA Accelerator adds new SHA save/restore functionality
30+* Adams Bridge Dilithium/ML-DSA (refer to [Adams bridge spec](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md))
31+* Subsystem mode support (refer to [Subsystem Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/Caliptra%202.0%20Subsystem%20Specification%201.pdf) for details)
32+ * ECDH hardware support
33+ * HMAC512 hardware support
34+ * AXI Manager with DMA support (refer to [DMA Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSHardwareSpecification.md#caliptra-axi-manager--dma-assist))
35+ * Manufacturing and Debug Unlock
36+ * UDS programming
37+ * Read logic for Secret Fuses
38+ * Streaming Boot Support
39+* RISC-V core PMP support
40+* CSR HMAC key for manufacturing flow
2441
2542 ## Boot FSM
2643
@@ -57,12 +74,13 @@
5774 | Parameter | Configuration |
5875 | :---------------------- | :------------ |
5976 | Interface | AHB-Lite |
60-| DCCM | 128 KiB |
61-| ICCM | 128 KiB |
77+| DCCM | 256 KiB |
78+| ICCM | 256 KiB |
6279 | I-Cache | Disabled |
6380 | Reset Vector | 0x00000000 |
6481 | Fast Interrupt Redirect | Enabled |
6582 | External Interrupts | 31 |
83+| PMP | Enabled |
6684
6785
6886 ### Embedded memory export
@@ -75,12 +93,12 @@
7593
7694 | Subsystem | Address size | Start address | End address |
7795 | :------------------ | :----------- | :------------ | :---------- |
78-| ROM | 48 KiB | 0x0000_0000 | 0x0000_BFFF |
96+| ROM | 96 KiB | 0x0000_0000 | 0x0000_BFFF |
7997 | Cryptographic | 512 KiB | 0x1000_0000 | 0x1007_FFFF |
8098 | Peripherals | 32 KiB | 0x2000_0000 | 0x2000_7FFF |
81-| SoC IFC | 256 KiB | 0x3000_0000 | 0x3003_FFFF |
82-| RISC-V Core ICCM | 128 KiB | 0x4000_0000 | 0x4001_FFFF |
83-| RISC-V Core DCCM | 128 KiB | 0x5000_0000 | 0x5001_FFFF |
99+| SoC IFC | 512 KiB | 0x3000_0000 | 0x3007_FFFF |
100+| RISC-V Core ICCM | 256 KiB | 0x4000_0000 | 0x4003_FFFF |
101+| RISC-V Core DCCM | 256 KiB | 0x5000_0000 | 0x5003_FFFF |
84102 | RISC-V MM CSR (PIC) | 256 MiB | 0x6000_0000 | 0x6FFF_FFFF |
85103
86104
@@ -92,12 +110,14 @@
92110 | :---------------------------------- | :-------- | :----------- | :------------ | :---------- |
93111 | Cryptographic Initialization Engine | 0 | 32 KiB | 0x1000_0000 | 0x1000_7FFF |
94112 | ECC Secp384 | 1 | 32 KiB | 0x1000_8000 | 0x1000_FFFF |
95-| HMAC384 | 2 | 4 KiB | 0x1001_0000 | 0x1001_0FFF |
113+| HMAC512 | 2 | 4 KiB | 0x1001_0000 | 0x1001_0FFF |
96114 | Key Vault | 3 | 8 KiB | 0x1001_8000 | 0x1001_9FFF |
97115 | PCR Vault | 4 | 8 KiB | 0x1001_A000 | 0x1001_BFFF |
98116 | Data Vault | 5 | 8 KiB | 0x1001_C000 | 0x1001_DFFF |
99117 | SHA512 | 6 | 32 KiB | 0x1002_0000 | 0x1002_7FFF |
100-| SHA256 | 13 | 32 KiB | 0x1002_8000 | 0x1002_FFFF |
118+| SHA256 | 10 | 32 KiB | 0x1002_8000 | 0x1002_FFFF |
119+| ML-DSA | 14 | 64 KiB | 0x1003_0000 | 0x1003_FFFF |
120+| AES | 15 | 4 KiB | 0x1001_1000 | 0x1001_1FFF |
101121
102122
103123 #### Peripherals subsystem
@@ -106,10 +126,8 @@
106126
107127 | IP/Peripheral | Target \# | Address size | Start address | End address |
108128 | :------------ | :-------- | :----------- | :------------ | :---------- |
109-| QSPI | 7 | 4 KiB | 0x2000_0000 | 0x2000_0FFF |
110-| UART | 8 | 4 KiB | 0x2000_1000 | 0x2000_1FFF |
111-| CSRNG | 15 | 4 KiB | 0x2000_2000 | 0x2000_2FFF |
112-| ENTROPY SRC | 16 | 4 KiB | 0x2000_3000 | 0x2000_3FFF |
129+| CSRNG | 12 | 4 KiB | 0x2000_2000 | 0x2000_2FFF |
130+| ENTROPY SRC | 13 | 4 KiB | 0x2000_3000 | 0x2000_3FFF |
113131
114132
115133 #### SoC interface subsystem
@@ -118,10 +136,11 @@
118136
119137 | IP/Peripheral | Target \# | Address size | Start address | End address |
120138 | :------------------------- | :-------- | :----------- | :------------ | :---------- |
121-| Mailbox SRAM Direct Access | 10 | 128 KiB | 0x3000_0000 | 0x3001_FFFF |
122-| Mailbox CSR | 10 | 4 KiB | 0x3002_0000 | 0x3002_0FFF |
123-| SHA512 Accelerator CSR | 10 | 4 KiB | 0x3002_1000 | 0x3002_1FFF |
124-| Mailbox | 10 | 64 KiB | 0x3003_0000 | 0x3003_FFFF |
139+| Mailbox CSR | 7 | 4 KiB | 0x3002_0000 | 0x3002_0FFF |
140+| SHA512 Accelerator | 7 | 4 KiB | 0x3002_1000 | 0x3002_1FFF |
141+| AXI DMA | 7 | 4 KiB | 0x3002_2000 | 0x3002_2FFF |
142+| SOC IFC CSR | 7 | 64 KiB | 0x3003_0000 | 0x3003_FFFF |
143+| Mailbox SRAM Direct Access | 7 | 256 KiB | 0x3004_0000 | 0x3007_FFFF |
125144
126145
127146 #### RISC-V core local memory blocks
@@ -130,8 +149,8 @@
130149
131150 | IP/Peripheral | Target \# | Address size | Start address | End address |
132151 | :-------------- | :-------- | :----------- | :------------ | :---------- |
133-| ICCM0 (via DMA) | 12 | 128 KiB | 0x4000_0000 | 0x4001_FFFF |
134-| DCCM | 11 | 128 KiB | 0x5000_0000 | 0x5001_FFFF |
152+| ICCM0 (via DMA) | 9 | 256 KiB | 0x4000_0000 | 0x4003_FFFF |
153+| DCCM | 8 | 256 KiB | 0x5000_0000 | 0x5003_FFFF |
135154
136155
137156 ### Interrupts
@@ -171,14 +190,16 @@
171190 | SHA512 (Notifications) | 10 | 7 |
172191 | SHA256 (Errors) | 11 | 8 |
173192 | SHA256 (Notifications) | 12 | 7 |
174-| QSPI (Errors) | 13 | 4 |
175-| QSPI (Notifications) | 14 | 3 |
176-| UART (Errors) | 15 | 4 |
177-| UART (Notifications) | 16 | 3 |
178-| RESERVED | 17 | 4 |
179-| RESERVED | 18 | 3 |
193+| RESERVED | 13, 15, 17 | 4 |
194+| RESERVED | 14, 16, 18 | 3 |
180195 | Mailbox (Errors) | 19 | 8 |
181196 | Mailbox (Notifications) | 20 | 7 |
197+| SHA512 Accelerator (Errors) | 23 | 8 |
198+| SHA512 Accelerator (Notifications) | 24 | 7 |
199+| MLDSA (Errors) | 23 | 8 |
200+| MLDSA (Notifications) | 24 | 7 |
201+| AXI DMA (Errors) | 25 | 8 |
202+| AXI DMA (Notifications) | 26 | 7 |
182203
183204
184205 ## Watchdog timer
@@ -230,182 +251,18 @@
230251
231252 As a result of this implementation, 64-bit data transfers are not supported on the Caliptra AHB fabric. Firmware running on the internal microprocessor may only access memory and registers using a 32-bit or smaller request size, as 64-bit transfer requests will be corrupted.
232253
254+All AHB requests internal to Caliptra must be to an address that is aligned to the native data width of 4-bytes. Any AHB read or write by the Caliptra RISC-V processor that is not aligned to this boundary will fail to decode to the targeted register, will fail to write the submitted data, and will return read data of all zeroes. All AHB requests must also use the native size of 4 bytes (encoded in the hsize signal with a value of 2). The only exception to this is when the RISC-V processor performs byte-aligned, single-byte reads to the Mailbox SRAM using the direct-access mechanism described in [SoC Mailbox](#SoC-mailbox). In this case, a byte-aligned address must be accompanied by the correct size indicator for a single-byte access. Read addresses for byte accesses are aligned to the 4-byte boundary in hardware, and will successfully complete with the correct data at the specified byte offset. Direct mode SRAM writes must be 4-bytes in size and must be aligned to the 4-byte boundary. Hardware writes the entire dword of data to the aligned address, so attempts to write a partial word of data may result in data corruption.
255+
233256 ## Cryptographic subsystem
234257
235258 For details, see the [Cryptographic subsystem architecture](#cryptographic-subsystem-architecture) section.
236259
237-## Peripherals subsystem
238-
239-Caliptra includes QSPI and UART peripherals that are used to facilitate alternative operating modes and debug. In the first generation, Caliptra does not support enabling the QSPI interface. Similarly, the UART interface exists to facilitate firmware debug in an FPGA prototype, but should be disabled in final silicon. SystemVerilog defines used to disable these peripherals are described in the [Caliptra Integration Specification](https://github.com/chipsalliance/caliptra-rtl/blob/main/docs/CaliptraIntegrationSpecification.md). Operation of these peripherals is described in the following sections.
240-
241-### QSPI Flash Controller
242-
243-Caliptra implements a QSPI block that can communicate with 2 QSPI devices. This QSPI block is accessible to FW over the AHB-lite Interface.
244-
245-The QSPI block is composed of the spi\_host implementation. For information, see the [SPI\_HOST HWIP Technical Specification](https://opentitan.org/book/hw/ip/spi_host/index.html). The core code (see [spi\_host](https://github.com/lowRISC/opentitan/tree/master/hw/ip/spi_host)) is reused but the interface to the module is changed to AHB-lite and the number of chip select lines supported is increased to 2. The design provides support for Standard SPI, Dual SPI, or Quad SPI commands. The following figure shows the QSPI flash controller.
246-
247-*Figure 4: QSPI flash controller*
248-
249-![](../images/caliptra-rtl/docs/images/QSPI_flash.png)
250-
251-#### Operation
252-
253-Transactions flow through the QSPI block starting with AHB-lite writes to the TXDATA FIFO. Commands are then written and processed by the control FSM, orchestrating transmissions from the TXDATA FIFO and receiving data into the RXDATA FIFO.
254-
255-The structure of a command depends on the device and the command itself. In the case of a standard SPI device, the host IP always transmits data on qspi\_d\_io[0] and always receives data from the target device on qspi\_d\_io[1]. In Dual or Quad modes, all data lines are bi-directional, thus allowing full bandwidth in transferring data across 4 data lines.
256-
257-A typical SPI command consists of different segments that are combined as shown in the following example. Each segment can configure the length, speed, and direction. As an example, the following SPI read transaction consists of 2 segments.
258-
259-*Figure 5: SPI read transaction segments*
260-
261-![](../images/caliptra-rtl/docs/images/SPI_read.png)
262-
263-| Segment \# | Length (Bytes) | Speed | Direction | TXDATA FIFO | RXDATA FIFO |
264-| :--------- | :------------- | :------- | :---------------- | :----------- | :----------------- |
265-| 1 | 4 | standard | TX <br>qspi_d_io\[0\] | \[0\] 0x3 (ReadData) <br>\[1\] Addr\[23:16\] <br>\[2\] Addr\[15:8\] <br>\[3\] Addr\[7:0\] ||
266-| 2 | 1 | standard | RX <br>qspi_d_io\[1\] || \[0\] Data \[7:0\] |
267-
268-
269-In this example, the ReadData (0x3) command was written to the TXDATA FIFO, followed by the 3B address. This maps to a total of 4 bytes that are transmitted out across qspi\_d\_io[0] in the first segment. The second segment consists of a read command that receives 1 byte of data from the target device across qspi\_d\_io[1].
270-
271-QSPI consists of up to four command segments in which the host:
272-
273-1. Transmits instructions or data at the standard rate
274-2. Transmits instructions address or data on 2 or 4 data lines
275-3. Holds the bus in a high-impedance state for some number of dummy cycles where neither side transmits
276-4. Receives information from the target device at the specified rate (derived from the original command)
277-
278-The following example shows the QSPI segments.
279-
280-*Figure 6: QSPI segments*
281-
282-![](../images/caliptra-rtl/docs/images/QSPI_segments.png)
283-
284-| Segment \# | Length (Bytes) | Speed | Direction | TXDATA FIFO | RXDATA FIFO |
285-| :--------- | :------------- | :------- | :------------------ | :----------- | :---------------- |
286-| 1 | 1 | standard | TX <br>qspi_d_io\[3:0\] | \[0\] 0x6B (ReadDataQuad) ||
287-| 2 | 3\* | quad | TX <br>qspi_d_io\[3:0\] | \[1\] Addr\[23:16\] <br>\[2\] Addr\[15:8\] <br>\[3\] Addr\[7:0\] ||
288-| 3 | 2 | N/A | None (Dummy) |||
289-| 4 | 1 | quad | RX <br>qspi_d_io\[3:0\] || \[0\] Data\[7:0\] |
290-
291-
292-Note: In the preceding figure, segment 2 doesn’t show bytes 2 and 3 for brevity.
293-
294-#### Configuration
295-
296-The CONFIGOPTS multi-register has one entry per CSB line and holds clock configuration and timing settings that are specific to each peripheral. After the CONFIGOPTS multi-register is programmed for each SPI peripheral device, the values can be left unchanged.
297-
298-The most common differences between target devices are the requirements for a specific SPI clock phase or polarity, CPOL and CPHA. These clock parameters can be set via the CONFIGOPTS.CPOL or CONFIGOPTS.CPHA register fields.
299-
300-The SPI clock rate depends on the peripheral clock and a 16b clock divider configured by CONFIGOPTS.CLKDIV. The following equation is used to configure the SPI clock period:
301-
302-![](../images/caliptra-rtl/docs/images/Caliptra_eq_SPI_clk_period.png)
303-
304-By default, CLKDIV is set to 0, which means that the maximum frequency that can be achieved is at most half the frequency of the peripheral clock (Fsck = Fclk/2).
305-
306-We can rearrange the equation to solve for the CLKDIV:
307-
308-![](../images/caliptra-rtl/docs/images/Caliptra_eq_CLKDIV.png)
309-
310-Assuming a 400MHz target peripheral, and a SPI clock target of 100MHz:
311-
312-CONFIGOPTS.CLKDIV = (400/(2\*100)) -1 = 1
313-
314-The following figure shows CONFIGOPTS.
315-
316-*Figure 7: CONFIGOPTS*
317-
318-![](../images/caliptra-rtl/docs/images/CONFIGOPTS.png)
319-
320-#### Signal descriptions
321-
322-The QSPI block architecture inputs and outputs are described in the following table.
323-
324-| Name | Input or output | Description |
325-| :------------------ | :-------------- | :-------------------------------------------------------- |
326-| clk_i | input | All signal timings are related to the rising edge of clk. |
327-| rst_ni | input | The reset signal is active LOW and resets the core. |
328-| cio_sck_o | output | SPI clock |
329-| cio_sck_en_o | output | SPI clock enable |
330-| cio_csb_o\[1:0\] | output | Chip select \# (one hot, active low) |
331-| cio_csb_en_o\[1:0\] | output | Chip select \# enable (one hot, active low) |
332-| cio_csb_sd_o\[3:0\] | output | SPI data output |
333-| cio_csb_sd_en_o | output | SPI data output enable |
334-| cio_csb_sd_i\[3:0\] | input | SPI data input |
335-
336-
337-#### SPI\_HOST IP programming guide
338-
339-The operation of the SPI\_HOST IP proceeds in seven general steps.
340-
341-To initialize the IP:
342-
343-1. Program the CONFIGOPTS multi-register with the appropriate timing and polarity settings for each csb line.
344-2. Set the desired interrupt parameters.
345-3. Enable the IP.
346-
347-Then for each command:
348-
349-4. Load the data to be transmitted into the FIFO using the TXDATA memory window.
350-5. Specify the target device by programming the CSID.
351-6. Specify the structure of the command by writing each segment into the COMMAND register.
352-
353- For multi-segment transactions, assert COMMAND.CSAAT for all but the last command segment.
354-
355-7. For transactions that expect to receive a reply, the data can then be read back from the RXDATA window.
356-
357-Steps 4-7 are then repeated for each subsequent command.
358-
359-### UART
360-
361-Caliptra implements a UART block that can communicate with a serial device that is accessible to FW over the AHB-lite Interface. This is a configuration that the SoC opts-in by defining CALIPTRA\_INTERNAL\_UART.
362-
363-The UART block is composed of the uart implementation. For information, see the [UART HWIP Technical Specification](https://opentitan.org/book/hw/ip/uart/). The design provides support for a programmable baud rate. The UART block is shown in the following figure.
364-
365-*Figure 8: UART block*
366-
367-![](../images/caliptra-rtl/docs/images/UART_block.png)
368-
369-#### Operation
370-
371-Transactions flow through the UART block starting with an AHB-lite write to WDATA, which triggers the transmit module to start a UART TX serial data transfer. The TX module dequeues the byte from the internal FIFO and shifts it out bit by bit at the baud rate. If TX is not enabled, the output is set high and WDATA in the FIFO is queued up.
372-
373-The following figure shows the transmit data on the serial lane, starting with the START bit, which is indicated by a high to low transition, followed by the 8 bits of data.
374-
375-*Figure 9: Serial transmission frame*
376-
377-![](../images/caliptra-rtl/docs/images/serial_transmission.png)
378-
379-On the receive side, after the START bit is detected, the data is sampled at the center of each data bit and stored into a FIFO. A user can monitor the FIFO status and read the data out of RDATA.
380-
381-#### Configuration
382-
383-The baud rate can be configured using the CTRL.NCO register field. This should be set using the following equation:
384-
385-![](../images/caliptra-rtl/docs/images/Caliptra_eq_NCO.png)
386-
387-If the desired baud rate is 115,200bps:
388-
389-![](../images/caliptra-rtl/docs/images/Caliptra_eq_UART.png)
390-
391-![](../images/caliptra-rtl/docs/images/Caliptra_eq_UART2.png)
392-
393-#### Signal descriptions
394-
395-The UART block architecture inputs and outputs are described in the following table.
396-
397-| Name | Input or output | Description |
398-| :------- | :-------------- | :-------------------------------------------------------- |
399-| clk_i | input | All signal timings are related to the rising edge of clk. |
400-| rst_ni | input | The reset signal is active LOW and resets the core. |
401-| cio_rx_i | input | Serial receive bit |
402-| cio_tx_o | output | Serial transmit bit |
403-
404-
405260 ## SoC mailbox
406261
407262 For more information on the mailbox protocol, see [Mailbox](https://github.com/chipsalliance/caliptra-rtl/blob/main/docs/CaliptraIntegrationSpecification.md#mailbox) in the Caliptra Integration Specification. Mailbox registers accessible to the Caliptra microcontroller are defined in [internal-regs/mbox_csr](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_csr).
408263
264+The RISC-V processor is able to access the SoC mailbox SRAM using a direct access mode (which bypasses the defined mailbox protocol). The addresses for performing this access are described in [SoC interface subsystem](#SoC-interface-subsystem) and in [mbox_sram](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_sram). In this mode, firmware must first acquire the mailbox lock. Then, reads and writes to the direct access address region will go directly to the SRAM block. Firmware must release the mailbox lock by writing to the [mbox_unlock](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_csr.mbox_unlock) register after direct access operations are completed.
265+
409266
410267 ## Security state
411268
@@ -417,7 +274,7 @@
417274
418275 * Caliptra JTAG is opened for the microcontroller and HW debug.
419276
420-* Device secrets (UDS, FE, key vault, and obfuscation key) are programmed to debug values.
277+* Device secrets (UDS, FE, key vault, csr hmac key and obfuscation key) are programmed to debug values.
421278
422279 If a transition to debug mode happens during ROM operation, any values computed from the use of device secrets may not match expected values.
423280
@@ -428,11 +285,14 @@
428285 | Name | Default value |
429286 | :-------------------------- | :------------ |
430287 | Obfuscation Key Debug Value | All 0x1 |
288+| CSR HMAC Key Debug Value | All 0x1 |
431289 | UDS Debug Value | All 0x1 |
432290 | Field Entropy Debug Value | All 0x1 |
433291 | Key Vault Debug Value 0 | All 0xA |
434292 | Key Vault Debug Value 1 | All 0x5 |
435293
294+
295+Note: When entering debug or scan mode, all crypto engines are zeroized. Before starting any crypto operation in these modes, the status registers of all crypto engines must be checked to confirm they are ready. Failing to do so may trigger a fatal error caused by concurrent crypto operations.
436296
437297 ## Clock gating
438298
@@ -472,17 +332,17 @@
472332
473333 * JTAG accesses
474334
475-* APB transactions
476-
477-Activity on the APB interface only wakes up the SoC IFC clock. All other clocks remain off until any other condition is met or the core exits the halt state.
478-
479-| Cpu_halt_status | PSEL | Generic input wires <br>&#124;&#124; fatal error <br>&#124;&#124; debug/scan mode <br> &#124;&#124;JTAG access | Expected behavior |
335+* AXI transactions
336+
337+Activity on the AXI subordinate interface only wakes up the SoC IFC clock. All other clocks remain off until any other condition is met or the core exits the halt state.
338+
339+| Cpu_halt_status | s_axi_active | Generic input wires <br>&#124;&#124; fatal error <br>&#124;&#124; debug/scan mode <br> &#124;&#124;JTAG access | Expected behavior |
480340 | :-------------- | :--- | :---------- | :-------------- |
481341 | 0 | X | X | All gated clocks active |
482342 | 1 | 0 | 0 | All gated clocks inactive |
483343 | 1 | 0 | 1 | All gated clocks active (as long as condition is true) |
484-| 1 | 1 | 0 | Soc_ifc_clk_cg active (as long as PSEL = 1) <br>All other clks inactive |
485-| 1 | 1 | 1 | Soc_ifc_clk_cg active (as long as condition is true OR PSEL = 1) <br>All other clks active (as long as condition is true) |
344+| 1 | 1 | 0 | Soc_ifc_clk_cg active (as long as s_axi_active = 1) <br>All other clks inactive |
345+| 1 | 1 | 1 | Soc_ifc_clk_cg active (as long as condition is true OR s_axi_active = 1) <br>All other clks active (as long as condition is true) |
486346
487347
488348 ### Usage
@@ -490,7 +350,7 @@
490350 The following applies to the clock gating feature:
491351
492352 * The core should only be halted after all pending vault writes are done and cryptographic operations are complete.
493-* While the core is halted, any APB transaction wakes up the SoC interface clock and leaves all other clocks disabled. If the core is still halted when the APB transactions are done, the SoC interface clock is returned to a disabled state. .
353+* While the core is halted, any AXI transaction wakes up the SoC interface clock and leaves all other clocks disabled. If the core is still halted when the AXI transactions are done, the SoC interface clock is returned to a disabled state. .
494354 * The RDC clock is similar to an ungated clock and is only disabled when a reset event occurs. This avoids metastability on flops. The RDC clock operates independently of core halt status.
495355
496356
@@ -530,7 +390,7 @@
530390
531391 ### Operation
532392
533-Requests for entropy bits start with [command requests](https://opentitan.org/book/hw/ip/csrng/doc/theory_of_operation.html#general-command-format) over the AHB-lite interface to the csrng [CMD\_REQ](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.csrng_reg.CMD_REQ) register.
393+Requests for entropy bits start with [command requests](https://opentitan.org/book/hw/ip/csrng/doc/theory_of_operation.html#general-command-format) over the AHB-lite interface to the csrng [CMD\_REQ](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.csrng_reg.CMD_REQ) register.
534394
535395 The following describes the fields of the command request header:
536396
@@ -542,7 +402,7 @@
542402
543403 * Generate Length: Only defined for the generate command, this field is the total number of cryptographic entropy blocks requested. Each unit represents 128 bits of entropy returned. A value of 8 would return a total of 1024 bits. The maximum size supported is 4096.
544404
545-First an instantiate command is requested over the SW application interface to initialize an instance in the CSRNG module. Depending on the flag0 and clen fields in the command header, a request to the entropy\_src module over the entropy interface is sent to seed the csrng. This can take a few milliseconds if the seed entropy is not immediately available.
405+First an instantiate command is requested over the SW application interface to initialize an instance in the CSRNG module. Depending on the flag0 and clen fields in the command header, a request to the entropy\_src module over the entropy interface is sent to seed the csrng. This can take a few milliseconds if the seed entropy is not immediately available.
546406
547407 Example instantiation:
548408
@@ -560,7 +420,7 @@
560420 | T | 1-12 | Only provided additional data is used as seed. |
561421
562422
563-Next a generate command is used to request generation of cryptographic entropy bits. The glen field defines how many 128 bit words are to be returned to the application interface. After the generated bits are ready, they can be read out via the GENBITS register. This register must be read out glen \* 4 times for each request made.
423+Next a generate command is used to request generation of cryptographic entropy bits. The glen field defines how many 128 bit words are to be returned to the application interface. After the generated bits are ready, they can be read out via the GENBITS register. This register must be read out glen \* 4 times for each request made.
564424
565425 Example generate command:
566426
@@ -634,6 +494,111 @@
634494
635495 The CSRNG may only be enabled if entropy\_src is enabled. After it is disabled, CSRNG may only be re-enabled after entropy\_src has been disabled and re-enabled.
636496
497+### FIPS considerations
498+
499+The following sections illustrate the self-test parameter configuration. The
500+`entropy_src` block provides additional tests, but Caliptra focuses primarily
501+on the adaptive and repetition count tests, which are the ones strictly
502+required for FIPS compliance. Additional details can be found in NIST
503+publication SP 800-90B.
504+
505+The TRNG must be re-initialized whenever self-test parameter changes are
506+needed. As described in the previous section, the initialization steps
507+are as follows:
508+
509+1. Disable `csrng` and `entropy_src` in that order.
510+2. Apply new self-test configuration.
511+3. Enable `entropy_src` and `csrng` in that order.
512+
513+### Adaptive self-test window and thresholds
514+
515+This section details the configuration of the `entropy_src`, focusing on how
516+the test window size for the adaptive self-test is determined and how it
517+relates to threshold calculations.
518+
519+#### Understanding Test Window Sizes
520+
521+The adaptive self-test within the `entropy_src` block utilizes a
522+configurable test window. To clarify its interpretation, two terms are
523+defined:
524+
525+* `ENTROPY_TEST_WINDOW`: This refers to the test window size directly
526+ configured in the hardware registers of the `entropy_src` block.
527+* `ACTUAL_TEST_WINDOW`: This refers to the effective window size used for
528+ the adaptive self-test threshold calculations. Its value depends on how
529+ the test scores are aggregated.
530+
531+The aggregation method is determined by the CONF.THRESHOLD_SCOPE setting in
532+the entropy_src block.
533+
534+#### Aggregate per symbol
535+
536+When CONF.THRESHOLD_SCOPE is enabled:
537+
538+* The adaptive test combines the inputs from all physical entropy lines
539+ into a single, cumulative score.
540+* The test essentially treats the combined input as a single binary stream,
541+ counting the occurrences of '1's.
542+* In this configuration:
543+ * If `ENTROPY_TEST_WINDOW` is set to 1024, then
544+ * `ACTUAL_TEST_WINDOW` = `ENTROPY_TEST_WINDOW` = 1024
545+
546+#### Handle each physical noise source separately
547+
548+When `CONF.THRESHOLD_SCOPE` is disabled:
549+
550+* The adaptive test scores each individual physical noise input line
551+ independently.
552+* This allows for monitoring the health of each noise source.
553+* In this configuration (assuming, for example, 4 noise sources):
554+ * If `ENTROPY_TEST_WINDOW` is set to 4096 bits, then
555+ * `ACTUAL_TEST_WINDOW` = (`ENTROPY_TEST_WINDOW` / 4) = 1024
556+
557+#### Configuring adaptive self-test thresholds
558+
559+Once the `ACTUAL_TEST_WINDOW` is determined, the adaptive self-test
560+thresholds can be configured as follows:
561+
562+* `ADAPTP_HI_THRESHOLDS.FIPS_THRESH` = `adaptp_cutoff`
563+* `ADAPTP_LO_THRESHOLDS.FIPS_THRESH` = `ACTUAL_TEST_WINDOW` - `adaptp_cutoff`
564+
565+Here, `adaptp_cutoff` represents the pre-determined cutoff value for the
566+adaptive proportion test, as defined by NIST SP 800-90B. See the threshold
567+calculations below as an example.
568+
569+\\(α = 2^{-40}\\) (recommended)\
570+\\(H = 0.5\\) (example, estimated entropy measured from hardware)\
571+\\(W\\) = `ACTUAL_TEST_WINDOW`\
572+`adaptp_cutoff` = \\(1 + critbinom(W, 2^{-H}, 1 - α)\\)
573+
574+> Note: The `critbinom` function (critical binomial distribution function) is
575+> implemented by most spreadsheet applications.
576+
577+### Recommended configuration
578+
579+The following configuration is recommended for the adaptive and repetition
580+count tests:
581+
582+#### Adaptive test
583+
584+1. Set `CONF.THRESHOLD_SCOPE` to disabled. This allows the test to monitor
585+ and score each physical noise source individually, providing more granular
586+ health information.
587+2. Set `HEALTH_TEST_WINDOWS.FIPS_WINDOW` to 4096 bits. This value serves
588+ as the `ENTROPY_TEST_WINDOW`. With the current 4 noise source configuration,
589+ this is equivalent to 1024 bits per noise source, where each source produces
590+ 1 bit of entropy as defined in NIST SP 800-90B.
591+3. Calculate thresholds. Use an `ACTUAL_TEST_WINDOW` of 1024 bits (derived
592+ from step 2) in the adaptive test threshold formulas provided earlier in
593+ this subsection.
594+
595+#### Repetition count test
596+
597+The methodology used for calculating the repetition count threshold in the
598+ROM boot phase can be directly applied for this test as well. The threshold is
599+applied on a per-noise-source basis.
600+
601+
637602 ## External-TRNG REQ HW API
638603
639604 For SoCs that choose to not instantiate Caliptra’s integrated TRNG, Caliptra provides a TRNGREQ HW API.
@@ -647,18 +612,16 @@
647612
648613 ## SoC-SHA accelerator HW API
649614
650-Caliptra provides a SHA accelerator HW API for SoC and Caliptra internal FW to use. It is atomic in nature in that only one of them can use the SHA accelerator HW API at the same time. Details of the SHA accelerator register block may be found in the GitHub repository in [documentation](https://chipsalliance.github.io/caliptra-rtl/main/external-regs/?p=caliptra_top_reg.sha512_acc_csr) generated from the register definition file.
615+Caliptra provides a SHA accelerator HW API for Caliptra internal FW to use via mailbox or via DMA operations through the AXI subordinate interface. The SHA accelerator HW API is restricted on AXI for use by Caliptra via the AXI DMA assist block; this access restriction is enforced by checking logic on the AXI AxUSER signal associated with the request.
651616
652617 Using the HW API:
653618
654619 * A user of the HW API first locks the accelerator by reading the LOCK register. A read that returns the value 0 indicates that the resource was locked for exclusive use by the requesting user. A write of ‘1 clears the lock.
655-* The USER register captures the APB pauser value of the requestor that locked the SHA accelerator. This is the only user that is allowed to control the SHA accelerator by performing APB register writes. Writes by any other agent on the APB interface are dropped.
656-* MODE register is written to set the SHA execution mode.
657- * SHA accelerator supports both SHA384 and SHA512 modes of operation.
658- * SHA supports **streaming** mode: SHA is computed on a stream of incoming data to the DATAIN register. The EXECUTE register, when set, indicates to the accelerator that streaming is complete. The accelerator can then publish the result into the DIGEST register. When the VALID bit of the STATUS register is set, then the result in the DIGEST register is valid.
659- * SHA supports **Mailbox** mode: SHA is computed on LENGTH (DLEN) bytes of data stored in the mailbox beginning at START\_ADDRESS. This computation is performed when the EXECUTE register is set by the user. When the operation is completed and the result in the DIGEST register is valid, SHA accelerator sets the VALID bit of the STATUS register.
660- * The SHA computation engine in the SHA accelerator requires big endian data, but the SHA accelerator can accommodate mailbox input data in either the little endian or big endian format. By default, input data is assumed to be little endian and is swizzled to big endian at the byte level prior to computation. For the big endian format, data is loaded into the SHA engine as-is. Users may configure the SHA accelerator to treat data as big endian by setting the ENDIAN\_TOGGLE bit appropriately.
661- * See the register definition for the encodings.
620+* The USER register captures the AXI USERID value of the requestor that locked the SHA accelerator. This is the only user that is allowed to control the SHA accelerator by performing AXI register writes. Writes by any other agent on the AXI subordinate interface are dropped.
621+* SHA supports **Mailbox** mode: SHA is computed on LENGTH (DLEN) bytes of data stored in the mailbox beginning at START\_ADDRESS. This computation is performed when the EXECUTE register is set by the user. When the operation is completed and the result in the DIGEST register is valid, SHA accelerator sets the VALID bit of the STATUS register.
622+* Note that even though the mailbox size is fixed, due to SHA save/restore function enhancement, there is no limit on the size of the block that needs to be SHAd. SOC needs to follow FW API
623+* The SHA computation engine in the SHA accelerator requires big endian data, but the SHA accelerator can accommodate mailbox input data in either the little endian or big endian format. By default, input data is assumed to be little endian and is swizzled to big endian at the byte level prior to computation. For the big endian format, data is loaded into the SHA engine as-is. Users may configure the SHA accelerator to treat data as big endian by setting the ENDIAN\_TOGGLE bit appropriately.
624+* See the register definition for the encodings.
662625 * SHA engine also provides a ‘zeroize’ function through its CONTROL register to clear any of the SHA internal state. This can be used when the user wants to conceal previous state for debug or security reasons.
663626
664627 ## JTAG implementation
@@ -683,7 +646,7 @@
683646 * De-obfuscation engine
684647 * SHA512/384 (based on NIST FIPS 180-4 [2])
685648 * SHA256 (based on NIST FIPS 180-4 [2])
686- * HMAC384 (based on [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5] and [RFC 4868](https://tools.ietf.org/html/rfc4868) [6])
649+ * HMAC512 (based on [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5] and [RFC 4868](https://tools.ietf.org/html/rfc4868) [6])
687650 * Public-key cryptography
688651 * NIST Secp384r1 Deterministic Digital Signature Algorithm (based on FIPS-186-4 [11] and RFC 6979 [7])
689652 * Key vault
@@ -694,7 +657,7 @@
694657
695658 *Figure 17: Caliptra cryptographic subsystem*
696659
697-![](../images/caliptra-rtl/docs/images/crypto_subsystem.png)
660+![](../images/caliptra-rtl/docs/images/Crypto-2p0.png)
698661
699662 ## SHA512/SHA384
700663
@@ -927,13 +890,13 @@
927890 | 1 KiB message | 8761 | 21.90 | 45,657 |
928891
929892
930-## HMAC384
931-
932-Hash-based message authentication code (HMAC) is a cryptographic authentication technique that uses a hash function and a secret key. HMAC involves a cryptographic hash function and a secret cryptographic key. This implementation supports HMAC-SHA-384-192 as specified in [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5]. The implementation is compatible with the HMAC-SHA-384-192 authentication and integrity functions defined in [RFC 4868](https://tools.ietf.org/html/rfc4868) [6].
933-
934-Caliptra HMAC implementation uses SHA384 as the hash function, accepts a 384-bit key, and generates a 384-bit tag.
935-
936-The implementation also supports PRF-HMAC-SHA-384. The PRF-HMAC-SHA-384 algorithm is identical to HMAC-SHA-384-192, except that variable-length keys are permitted, and the truncation step is not performed.
893+## HMAC512/HMAC384
894+
895+Hash-based message authentication code (HMAC) is a cryptographic authentication technique that uses a hash function and a secret key. HMAC involves a cryptographic hash function and a secret cryptographic key. This implementation supports the HMAC512 variants HMAC-SHA-512-256 and HMAC-SHA-384-192 as specified in [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5]. The implementation is compatible with the HMAC-SHA-512-256 and HMAC-SHA-384-192 authentication and integrity functions defined in [RFC 4868](https://tools.ietf.org/html/rfc4868) [6].
896+
897+Caliptra HMAC implementation uses SHA512 as the hash function, accepts a 512-bit key, and generates a 512-bit tag.
898+
899+The implementation also supports PRF-HMAC-SHA-512. The PRF-HMAC-SHA-512 algorithm is identical to HMAC-SHA-512-256, except that variable-length keys are permitted, and the truncation step is not performed.
937900
938901 The HMAC algorithm is described as follows:
939902 * The key is fed to the HMAC core to be padded
@@ -980,9 +943,15 @@
980943
981944 #### Hashing
982945
983-The HMAC core performs the sha2-384 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure.
984-
985-*Figure 28: HMAC-SHA-384-192 data flow*
946+The HMAC512 core performs the sha2-512 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure.
947+
948+*Figure 28: HMAC-SHA-512-256 data flow*
949+
950+![](../images/caliptra-rtl/docs/images/HMAC_SHA_512_256.png)
951+
952+The HMAC384 core performs the sha2-384 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure.
953+
954+*Figure 29: HMAC-SHA-384-192 data flow*
986955
987956 ![](../images/caliptra-rtl/docs/images/HMAC_SHA_384_192.png)
988957
@@ -990,26 +959,33 @@
990959
991960 The HMAC architecture has the finite-state machine as shown in the following figure.
992961
993-*Figure 29: HMAC FSM*
962+*Figure 30: HMAC FSM*
994963
995964 ![](../images/caliptra-rtl/docs/images/HMAC_FSM.png)
996965
966+### CSR Mode
967+
968+When the CSR Mode register is set, the HMAC512 core uses the value latched from the cptra_csr_hmac_key interface pins in place of the API key register. These pins are latched internally after powergood assertion during DEVICE_MANUFACTURING lifecycle state. During debug mode operation this value is overridden with all 1's, and during any other lifecycle state it has a value of zero.
969+
997970 ### Signal descriptions
998971
999972 The HMAC architecture inputs and outputs are described in the following table.
1000973
1001974 | Name | Input or output | Description |
1002-| :----------------- | :-------------- | :----------- |
975+| :-------------------------- | :-------------- | :----------- |
1003976 | clk | input | All signal timings are related to the rising edge of clk. |
1004977 | reset_n | input | The reset signal is active LOW and resets the core. This is the only active LOW signal. |
1005978 | init | input | The core is initialized and processes the key and the first block of the message. |
1006979 | next | input | The core processes the rest of the message blocks using the result from the previous blocks. |
1007980 | zeroize | input | The core clears all internal registers to avoid any SCA information leakage. |
1008-| key\[383:0\] | input | The input key. |
981+| csr_mode | input | When set, the key comes from the cptra_csr_hmac_key interface pins. This key is valid only during MANUFACTURING mode. |
982+| mode | input | Indicates the hmac type of the function. This can be: <br>- HMAC384 <br>- HMAC512. |
983+| cptra_csr_hmac_key\[511:0\] | input | The key to be used during csr mode. |
984+| key\[511:0\] | input | The input key. |
1009985 | block\[1023:0\] | input | The input padded block of message. |
1010-| LFSR_seed\[159:0\] | Input | The input to seed PRNG to enable the masking countermeasure for SCA protection. |
986+| LFSR_seed\[383:0\] | Input | The input to seed PRNG to enable the masking countermeasure for SCA protection. |
1011987 | ready | output | When HIGH, the signal indicates the core is ready. |
1012-| tag\[383:0\] | output | The HMAC value of the given key or block. For PRF-HMAC-SHA-384, a 384-bit tag is required. For HMAC-SHA-384-192, the host is responsible for reading 192 bits from the MSB. |
988+| tag\[511:0\] | output | The HMAC value of the given key or block. For PRF-HMAC-SHA-512, a 512-bit tag is required. For HMAC-SHA-512-256, the host is responsible for reading 256 bits from the MSB. |
1013989 | tag_valid | output | When HIGH, the signal indicates the result is ready. |
1014990
1015991
@@ -1021,7 +997,7 @@
1021997
1022998 The following pseudocode demonstrates how the HMAC interface can be implemented.
1023999
1024-*Figure 30: HMAC pseudocode*
1000+*Figure 31: HMAC pseudocode*
10251001
10261002 ![](../images/caliptra-rtl/docs/images/HMAC_pseudo.png)
10271003
@@ -1033,7 +1009,7 @@
10331009
10341010 The embedded countermeasures are based on "Differential Power Analysis of HMAC Based on SHA-2, and Countermeasures" by McEvoy et. al. To provide the required random values for masking intermediate values, a lightweight 74-bit LFSR is implemented. Based on “Spin Me Right Round Rotational Symmetry for FPGA-specific AES” by Wegener et. al., LFSR is sufficient for masking statistical randomness.
10351011
1036-Each round of SHA512 execution needs 6,432 random bits, while one HMAC operation needs at least 4 rounds of SHA512 operations. However, the proposed architecture requires only 160-bit LFSR seed and provides first-order DPA attack protection at the cost of 10% latency overhead with negligible hardware resource overhead.
1012+Each round of SHA512 execution needs 6,432 random bits, while one HMAC operation needs at least 4 rounds of SHA512 operations. However, the proposed architecture requires only 384-bit LFSR seed and provides first-order DPA attack protection at the cost of 10% latency overhead with negligible hardware resource overhead.
10371013
10381014 ### Performance
10391015
@@ -1054,9 +1030,9 @@
10541030 | 128 KiB message | 207,979 | 519.947 | 1,923 |
10551031
10561032
1057-#### Hardware/software architecture
1058-
1059-In this architecture, the HMAC interface and controller are implemented in RISC-V core. The performance specification of the HMAC architecture is reported as shown in the following table.
1033+#### Hardware/software architecture
1034+
1035+In this architecture, the HMAC interface and controller are implemented in RISC-V core. The performance specification of the HMAC architecture is reported as shown in the following table.
10601036
10611037 | Operation | Cycle count \[CCs\] | Time \[us\] @ 400 MHz | Throughput \[op/s\] |
10621038 | :-------------------- | :------------------ | :-------------------- | :------------------ |
@@ -1090,7 +1066,7 @@
10901066
10911067 1. Set V_init = 0x01 0x01 0x01 ... 0x01 (V has 384-bit)
10921068 2. Set K_init = 0x00 0x00 0x00 ... 0x00 (K has 384-bit)
1093- 3. K_tmp = HMAC(K_init, V_init || 0x00 || entropy || nonce)
1069+ 3. K_tmp = HMAC(K_init, V_init || 0x00 || entropy || nonce)
10941070 4. V_tmp = HMAC(K_tmp, V_init)
10951071 5. K_new = HMAC(K_tmp, V_tmp || 0x01 || entropy || nonce)
10961072 6. V_new = HMAC(K_new, V_tmp)
@@ -1138,13 +1114,15 @@
11381114
11391115 ## ECC
11401116
1141-The ECC unit includes the ECDSA (Elliptic Curve Digital Signature Algorithm) engine, offering a variant of the cryptographically secure Digital Signature Algorithm (DSA), which uses elliptic curve (ECC). A digital signature is an authentication method in which a public key pair and a digital certificate are used as a signature to verify the identity of a recipient or sender of information.
1117+The ECC unit includes the ECDSA (Elliptic Curve Digital Signature Algorithm) engine and the ECDH (Elliptic Curve Diffie-Hellman Key-Exchange) engine, offering a variant of the cryptographically secure Digital Signature Algorithm (DSA) and Diffie-Hellman Key-Exchange (DH), which uses elliptic curve (ECC). A digital signature is an authentication method in which a public key pair and a digital certificate are used as a signature to verify the identity of a recipient or sender of information.
11421118
11431119 The hardware implementation supports deterministic ECDSA, 384 Bits (Prime Field), also known as NIST-Secp384r1, described in RFC6979.
11441120
1121+The hardware implementation also supports ECDH, 384 Bits (Prime Field), also known as NIST-Secp384r1, described in SP800-56A.
1122+
11451123 Secp384r1 parameters are shown in the following figure.
11461124
1147-*Figure 31: Secp384r1 parameters*
1125+*Figure 32: Secp384r1 parameters*
11481126
11491127 ![](../images/caliptra-rtl/docs/images/secp384r1_params.png)
11501128
@@ -1152,9 +1130,11 @@
11521130
11531131 The ECDSA consists of three operations, shown in the following figure.
11541132
1155-*Figure 32: ECDSA operations*
1133+*Figure 33: ECDSA operations*
11561134
11571135 ![](../images/caliptra-rtl/docs/images/ECDSA_ops.png)
1136+
1137+The ECDH also consists of the sharedkey generation.
11581138
11591139 #### KeyGen
11601140
@@ -1166,7 +1146,7 @@
11661146
11671147 #### Signing
11681148
1169-In the signing algorithm, a signature (r, s) is generated by Sign(privKey, h), taking a privKey and hash of message m, h = hash(m), using a cryptographic hash function, SHA384. The signing algorithm includes:
1149+In the signing algorithm, a signature (r, s) is generated by Sign(privKey, h), taking a privKey and hash of message m, h = hash(m), using a cryptographic hash function, SHA512. The signing algorithm includes:
11701150
11711151 * Generate a random number k in the range [1..n-1], while k = HMAC\_DRBG(privKey, h)
11721152 * Calculate the random point R = k × G
@@ -1176,24 +1156,32 @@
11761156
11771157 #### Verifying
11781158
1179-The signature (r, s) can be verified by Verify(pubKey ,h ,r, s) considering the public key pubKey and hash of message m, h=hash(m) using the same cryptographic hash function SHA384. The output is r’ value of verifying a signature. The ECDSA verify algorithm includes:
1159+The signature (r, s) can be verified by Verify(pubKey ,h ,r, s) considering the public key pubKey and hash of message m, h=hash(m) using the same cryptographic hash function SHA512. The output is r’ value of verifying a signature. The ECDSA verify algorithm includes:
11801160
11811161 * Calculate s1 = s<sup>−1</sup> mod n
11821162 * Compute R' = (h × s1) × G + (r × s1) × pubKey
11831163 * Take r’ = R'x mod n, while R'x is x coordinate of R’=(R'x, R'y)
11841164 * Verify the signature by comparing whether r' == r
11851165
1166+#### ECDH sharedkey
1167+
1168+In ECDH sharedkey generation, the shared key is generated by ECDH_sharedkey(privKey_A, pubKey_B), taking an own prikey and other party pubkey. The ECDH sharedkey algorithm is as follows:
1169+
1170+* Compute P = sharedkey(privkey_A, pubkey_b) where P(x,y) is a point on ECC.
1171+* Output sharedkey = Px, where Px is x coordinate of P.
1172+
1173+
11861174 ### Architecture
11871175
11881176 The ECC top-level architecture is shown in the following figure.
11891177
1190-*Figure 33: ECDSA architecture*
1191-
1192-![](../images/caliptra-rtl/docs/images/ECDSA_arch.png)
1178+*Figure 34: ECC architecture*
1179+
1180+![](../images/caliptra-rtl/docs/images/ECC_arch.png)
11931181
11941182 ### Signal descriptions
11951183
1196-The ECDSA architecture inputs and outputs are described in the following table.
1184+The ECC architecture inputs and outputs are described in the following table.
11971185
11981186
11991187 | Name | Input or output | Description |
@@ -1206,49 +1194,56 @@
12061194 | nonce \[383:0\] | input | The deterministic nonce for HMAC_DRBG in the KeyGen operation. |
12071195 | privKey_in\[383:0\] | input | The input private key used in the signing operation. |
12081196 | pubKey_in\[1:0\]\[383:0\] | input | The input public key(x,y) used in the verifying operation. |
1209-| hashed_msg\[383:0\] | input | The hash of message using SHA384. |
1197+| hashed_msg\[383:0\] | input | The hash of message using SHA512. |
12101198 | ready | output | When HIGH, the signal indicates the core is ready. |
12111199 | privKey_out\[383:0\] | output | The generated private key in the KeyGen operation. |
12121200 | pubKey_out\[1:0\]\[383:0\] | output | The generated public key(x,y) in the KeyGen operation. |
12131201 | r\[383:0\] | output | The signature value of the given priveKey/message. |
12141202 | s\[383:0\] | output | The signature value of the given priveKey/message. |
12151203 | r’\[383:0\] | Output | The signature verification result. |
1204+| DH_sharedkey\[383:0\] | output | The generated shared key in the ECDH sharedkey operation. |
12161205 | valid | output | When HIGH, the signal indicates the result is ready. |
12171206
12181207
12191208 ### Address map
12201209
1221-The ECDSA address map is shown here: [ecc\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.ecc_reg).
1210+The ECC address map is shown here: [ecc\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.ecc_reg).
12221211
12231212 ### Pseudocode
12241213
1225-The following pseudocode blocks demonstrate example implementations for KeyGen, Signing, and Verifying.
1214+The following pseudocode blocks demonstrate example implementations for KeyGen, Signing, Verifying, and ECDH sharedkey.
12261215
12271216 #### KeyGen
12281217
1229-*Figure 34: KeyGen pseudocode*
1218+*Figure 35: KeyGen pseudocode*
12301219
12311220 ![](../images/caliptra-rtl/docs/images/keygen_pseudo.png)
12321221
12331222 #### Signing
12341223
1235-*Figure 35: Signing pseudocode*
1224+*Figure 36: Signing pseudocode*
12361225
12371226 ![](../images/caliptra-rtl/docs/images/signing_pseudo.png)
12381227
12391228 #### Verifying
12401229
1241-*Figure 36: Verifying pseudocode*
1230+*Figure 37: Verifying pseudocode*
12421231
12431232 ![](../images/caliptra-rtl/docs/images/verify_pseudo.png)
12441233
1234+#### ECDH sharedkey
1235+
1236+*Figure 38: ECDH sharedkey pseudocode*
1237+
1238+![](../images/caliptra-rtl/docs/images/sharedkey_pseudo.png)
1239+
12451240 ### SCA countermeasure
12461241
1247-The described ECDSA has three main routines: KeyGen, Signing, and Verifying. Since the Verifying routine requires operation with public values rather than a secret value, our side-channel analysis does not cover this routine. Our evaluation covers the KeyGen and Signing routines where the secret values are processed.
1248-
1249-KeyGen consists of HMAC DRBG and scalar multiplication, while Signing first requires a message hashing and then follows the same operations as KeyGen (HMAC DRBG and scalar multiplication). The last step of Signing is generating “S” as the proof of signature. Since HMAC DRBG and hash operations are evaluated separately in our document, this evaluation covers scalar multiplication and modular arithmetic operations.
1250-
1251-#### Scalar multiplication
1242+The described ECC has four main routines: KeyGen, Signing, Verifying, and ECDH sharedkey. Since the Verifying routine requires operation with public values rather than a secret value, our side-channel analysis does not cover this routine. Our evaluation covers the KeyGen, Signing, and ECDH sharedkey routines where the secret values are processed.
1243+
1244+KeyGen consists of HMAC DRBG and scalar multiplication, while Signing first requires a message hashing and then follows the same operations as KeyGen (HMAC DRBG and scalar multiplication). The last step of Signing is generating “S” as the proof of signature. Since HMAC DRBG and hash operations are evaluated separately in our document, this evaluation covers scalar multiplication and modular arithmetic operations.
1245+
1246+#### Scalar multiplication
12521247
12531248 To perform the scalar multiplication, the Montgomery ladder is implemented, which is inherently resistant to timing and single power analysis (SPA) attacks.
12541249
@@ -1256,7 +1251,7 @@
12561251
12571252 To protect the architecture against horizontal power/electromagnetic (EM) and differential power analysis (DPA) attacks, several countermeasures are embedded in the design [9]. Since these countermeasures require random inputs, HMAC-DRBG is fed by IV to generate these random values.
12581253
1259-Since HMAC-DRBG generates random value in a deterministic way, firmware MUST feed different IV to ECC engine for EACH keygen and signing operation.
1254+Since HMAC-DRBG generates random value in a deterministic way, firmware MUST feed different IV to ECC engine for EACH keygen, signing, and ECDH sharedkey operation.
12601255
12611256 #### Base point randomization
12621257
@@ -1284,7 +1279,7 @@
12841279
12851280 Generating “S” as the proof of signature at the steps of the signing operation leaks where the hashed message is signed with private key and ephemeral key as follows:
12861281
1287-Since the given message is known or the signature part r is known, the attacker can perform a known-plaintext attack. The attacker can sign multiple messages with the same key, or the attacker can observe part of the signature that is generated with multiple messages but the same key.
1282+Since the given message is known or the signature part r is known, the attacker can perform a known-plaintext attack. The attacker can sign multiple messages with the same key, or the attacker can observe part of the signature that is generated with multiple messages but the same key.
12881283
12891284 The evaluation shows that the CPA attack can be performed with a small number of traces, respectively. Thus, an arithmetic masked design for these operations is implemented.
12901285
@@ -1292,7 +1287,7 @@
12921287
12931288 This countermeasure is achieved by randomizing the privkey as follows:
12941289
1295-Although computation of “S” seems the most vulnerable point in our scheme, the operation does not have a big contribution to overall latency. Hence, masking these operations has low overhead on the cost of the design.
1290+Although computation of “S” seems the most vulnerable point in our scheme, the operation does not have a big contribution to overall latency. Hence, masking these operations has low overhead on the cost of the design.
12961291
12971292 #### Random number generator for SCA countermeasure
12981293
@@ -1304,7 +1299,7 @@
13041299 2. KEYGEN PRIVKEY: Running HMAC\_DRBG with seed and nonce to generate the privkey in KEYGEN operation.
13051300 3. SIGNING NONCE: Running HMAC\_DRBG based on RFC6979 in SIGNING operation with privkey and hashed\_msg.
13061301
1307-*Figure 37: HMAC\_DRBG utilization*
1302+*Figure 39: HMAC\_DRBG utilization*
13081303
13091304 ![](../images/caliptra-rtl/docs/images/HMAC_DRBG_util.png)
13101305
@@ -1320,7 +1315,7 @@
13201315
13211316 The data flow of the HMAC\_DRBG operation in keygen operation mode is shown in the following figure.
13221317
1323-*Figure 38: HMAC\_DRBG data flow*
1318+*Figure 40: HMAC\_DRBG data flow*
13241319
13251320 ![](../images/caliptra-rtl/docs/images/HMAC_DRBG_data.png)
13261321
@@ -1330,7 +1325,7 @@
13301325
13311326 In practice, observing a t-value greater than a specific threshold (mainly 4.5) indicates the presence of leakage. However, in ECC, due to its latency, around 5 million samples are required to be captured. This latency leads to many false positives and the TVLA threshold can be considered a higher value than 4.5. Based on the following figure from “Side-Channel Analysis and Countermeasure Design for Implementation of Curve448 on Cortex-M4” by Bisheh-Niasar et. al., the threshold can be considered equal to 7 in our case.
13321327
1333-*Figure 39: TVLA threshold as a function of the number of samples per trace*
1328+*Figure 41: TVLA threshold as a function of the number of samples per trace*
13341329
13351330 ![](../images/caliptra-rtl/docs/images/TVLA_threshold.png)
13361331
@@ -1340,7 +1335,7 @@
13401335 The TVLA results for performing seed/nonce-dependent leakage detection using 200,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC keygen by changing the seed/nonce after 200,000 operations.
13411336
13421337
1343-*Figure 40: seed/nonce-dependent leakage detection using TVLA for ECC keygen after 200,000 traces*
1338+*Figure 42: seed/nonce-dependent leakage detection using TVLA for ECC keygen after 200,000 traces*
13441339
13451340 ![](../images/caliptra-rtl/docs/images/tvla_keygen.png)
13461341
@@ -1348,13 +1343,13 @@
13481343
13491344 The TVLA results for performing privkey-dependent leakage detection using 20,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC signing by changing the privkey after 20,000 operations.
13501345
1351-*Figure 41: privkey-dependent leakage detection using TVLA for ECC signing after 20,000 traces*
1346+*Figure 43: privkey-dependent leakage detection using TVLA for ECC signing after 20,000 traces*
13521347
13531348 ![](../images/caliptra-rtl/docs/images/TVLA_privekey.png)
13541349
13551350 The TVLA results for performing message-dependent leakage detection using 64,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC signing by changing the message after 64,000 operations.
13561351
1357-*Figure 42: Message-dependent leakage detection using TVLA for ECC signing after 64,000 traces*
1352+*Figure 44: Message-dependent leakage detection using TVLA for ECC signing after 64,000 traces*
13581353
13591354 ![](../images/caliptra-rtl/docs/images/TVLA_msg_dependent.png)
13601355
@@ -1391,17 +1386,17 @@
13911386
13921387 ## LMS Accelerator
13931388
1394-LMS cryptography is a type of hash-based digital signature scheme that was standardized by NIST in 2020. It is based on the Leighton-Micali Signature (LMS) system, which uses a Merkle tree structure to combine many one-time signature (OTS) keys into a single public key. LMS cryptography is resistant to quantum attacks and can achieve a high level of security without relying on large integer mathematics.
1389+LMS cryptography is a type of hash-based digital signature scheme that was standardized by NIST in 2020. It is based on the Leighton-Micali Signature (LMS) system, which uses a Merkle tree structure to combine many one-time signature (OTS) keys into a single public key. LMS cryptography is resistant to quantum attacks and can achieve a high level of security without relying on large integer mathematics.
13951390
13961391 Caliptra supports only LMS verification using a software/hardware co-design approach. Hence, the LMS accelerator reuses the SHA256 engine to speedup the Winternitz chain by removing software-hardware interface overhead. The LMS-OTS verification algorithm is shown in follwoing figure:
13971392
1398-*Figure 43: LMS-OTS Verification algorithm*
1393+*Figure 45: LMS-OTS Verification algorithm*
13991394
14001395 ![](../images/caliptra-rtl/docs/images/LMS_verifying_alg.png)
14011396
14021397 The high-level architecture of LMS is shown in the following figure.
14031398
1404-*Figure 44: LMS high-level architecture*
1399+*Figure 46: LMS high-level architecture*
14051400
14061401 ![](../images/caliptra-rtl/docs/images/LMS_high_level.png)
14071402
@@ -1426,7 +1421,7 @@
14261421
14271422 The Winternitz hash chain can be accelerated in hardware to enhance the performance of the design. For that, a configurable architecture is proposed that can reuse SHA256 engine. The LMS accelerator architecture is shown in the following figure, while H is SHA256 engine.
14281423
1429-*Figure 45: Winternitz chain architecture*
1424+*Figure 47: Winternitz chain architecture*
14301425
14311426 ![](../images/caliptra-rtl/docs/images/LMS_wntz_arch.png)
14321427
@@ -1456,10 +1451,794 @@
14561451
14571452 The address map for LMS accelerator integrated into SHA256 is shown here: [sha256\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.sha256_reg).
14581453
1454+## Adams Bridge - Dilithium (ML-DSA)
1455+
1456+Please refer to the [Adams-bridge specification](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md)
1457+
1458+### Address map
1459+Address map of ML-DSA accelerator is shown here: [ML-DSA\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mldsa_reg)
1460+
1461+## AES
1462+
1463+The AES unit is a cryptographic accelerator that processes requests from the processor to encrypt or decrypt 16-byte data blocks. It supports AES-128/192/256 in various modes, including Electronic Codebook (ECB), Cipher Block Chaining (CBC), Cipher Feedback (CFB) with a fixed segment size of 128 bits (CFB-128), Output Feedback (OFB), Counter (CTR), and Galois/Counter Mode (GCM).
1464+
1465+The AES unit is reused from here, (see [aes](https://github.com/lowRISC/opentitan/tree/master/hw/ip/aes) with a shim to translate from AHB-lite to the tl-ul interface.
1466+
1467+Additional registers have been added to support key vault integration. Keys from the key vault can be loaded into the AES unit to be used for encryption or decryption.
1468+
1469+### Operation
1470+
1471+For more information, see the [AES Programmer's Guide](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/programmers_guide.md).
1472+
1473+### Signal descriptions
1474+
1475+The AES architecture inputs and outputs are described in the following table.
1476+
1477+| Name | Input or output | Description |
1478+| :--------------------------------- | :-------------- | :----------- |
1479+| clk | input | All signal timings are related to the rising edge of clk. |
1480+| reset_n | input | The reset signal is active LOW and resets the core. This is the only active LOW signal. |
1481+| DATA_IN | input | Input block to be encrypted or decrypted. Written in four 32-bit registers. |
1482+| DATA_OUT | output | Output block result of encryption or decryption. Stored in four 32-bit registers. |
1483+| CTRL_SHADOWED.MANUAL_OPERATION | input | Configures the AES core to operation in manual mode. |
1484+| CTRL_SHADOWED.PRNG_RESEED_RATE | input | Configures the rate of reseeding the internal PRNG used for masking. |
1485+| CTRL_SHADOWED.SIDELOAD | input | When asserted, AES core will use the key from the keyvault interface. |
1486+| CTRL_SHADOWED.KEY_LEN | input | Configures the AES key length. Supports 128, 192, and 256-bit keys. |
1487+| CTRL_SHADOWED.MODE | input | Configures the AES block cipher mode. |
1488+| CTRL_SHADOWED.OPERATION | input | Configures the AES core to operate in encryption or decryption modes. |
1489+| CTRL_GCM_SHADOWED.PHASE | input | Configures the GCM phase. |
1490+| CTRL_GCM_SHADOWED.NUM_VALID_BYTES | input | Configures the number of valid bytes of the current input block in GCM. |
1491+| TRIGGER.PRNG_RESEED | input | Forces a PRNG reseed. |
1492+| TRIGGER.DATA_OUT_CLEAR | input | Clears the DATA_OUT registers with pseudo-random data. |
1493+| TRIGGER.KEY_IV_DATA_IN_CLEAR | input | Clears the Key, IV, and DATA_INT registers with pseudo-random data. |
1494+| TRIGGER.START | input | Triggers the encryption/decryption of one data block if in manual operation mode. |
1495+| STATUS.ALERT_FATAL_FAULT | output | A fatal fault has ocurred and the AES unit needs to be reset. |
1496+| STATUS.ALERT_RECOV_CTRL_UPDATE_ERR | output | An update error has occurred in the shadowed Control Register. AES operation needs to be restarted by re-writing the Control Register. |
1497+| STATUS.INPUT_READY | output | The AES unit is ready to receive new data input via the DATA_IN registers. |
1498+| STATUS.OUTPUT_VALID | output | The AES unit has alid output data. |
1499+| STATUS.OUTPUT_LOST | output | All previous output data has been fully read by the processor (0) or at least one previous output data block has been lost (1). It has been overwritten by the AES unit before the processor could fully read it. Once set to 1, this flag remains set until AES operation is restarted by re-writing the Control Register. The primary use of this flag is for design verification. This flag is not meaningful if MANUAL_OPERATION=0. |
1500+| STATUS.STALL | output | The AES unit is stalled because there is previous output data that must be read by the processor before the AES unit can overwrite this data. This flag is not meaningful if MANUAL_OPERATION=1. |
1501+| STATUS.IDLE | output | The AES unit is idle. |
1502+
1503+
1504+
1505+### Address map
1506+
1507+The AES address map is shown here: [aes\_clp\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.aes_clp_reg).
1508+
1509+### SCA countermeasures
1510+
1511+The AES unit employs separate SCA countermeasures for the AES cipher core used for the encryption/decryption part and for the GHASH module used for computing the integrity tag in GCM.
1512+
1513+### AES cipher core
1514+
1515+A detailed specification of the SCA countermeasure employed in the AES cipher core is shown here: [AES cipher core SCA countermeasure](https://opentitan.org/book/hw/ip/aes/doc/theory_of_operation.html#1st-order-masking-of-the-cipher-core).
1516+The most critical building block of the SCA countermeasure, i.e., the masked AES S-Box, successfully passes formal masking verification at the netlist level using [Alma: Execution-aware Masking Verification](https://github.com/IAIK/coco-alma).
1517+The flow required for repeating the formal masking verification using Alma together with a Howto can be found [here](https://github.com/lowRISC/opentitan/blob/master/hw/ip/aes/pre_sca/alma/README.md).
1518+The entire AES cipher core including the masked S-Boxes and as well as the PRNG generating the randomness for remasking successfully passes masking evaluation at the netlist level using [PROLEAD - A Probing-Based Leakage Detection Tool for Hardware and Software](https://github.com/ChairImpSec/PROLEAD).
1519+The flow required for repeating the masking evaluation using PROLEAD together with a Howto can be found [here](https://github.com/lowRISC/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/prolead/README.md).
1520+
1521+### GHASH module
1522+
1523+A detailed specification of the SCA countermeasure employed in the GHASH module is shown here: [GHASH module SCA countermeasure](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#1st-order-masking-of-the-ghash-module).
1524+
1525+To optimize and verify this masking countermeasure, two different types of experiments have been performed for which the results are given below.
1526+1. Formal masking verification using [Alma: Execution-aware Masking Verification](https://github.com/IAIK/coco-alma).
1527+ These experiments led to a [series of small design optimizations](https://github.com/vogelpi/opentitan/pull/18) which have been integrated into Caliptra.
1528+ The resulting design successfully passes formal masking verification at the netlist level.
1529+1. [Test-vector leakage assessment (TVLA)](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf) applied to power SCA traces captured on a ChipWhisperer-based FPGA setup.
1530+ These experiments confirm the formal masking verification results:
1531+ No 1st-order SCA can be observed during the GHASH operation.
1532+ The leakage observed at the boundary of and outside the GHASH operation can be attributed to the evaluation methodology and the handling of unmasked and uncritical data, as well as to FPGA-specific leakage effects known from literature.
1533+ We are confident that the optimized SCA hardening concept effectively deters SCA attacks.
1534+
1535+#### Formal masking verification using Alma
1536+
1537+[Alma](https://ieeexplore.ieee.org/document/9617707) is an open source, formal masking verification tool developed at TU Graz which enables formal verification of masking SCA countermeasures at the netlist level.
1538+The main advantages of this approach compared to analyzing FPGA power traces are as follows:
1539+
1540+* The turn-around time is much faster as it does not involve FPGA bitstream generation and capturing power traces (both can take several hours).
1541+* Netlist-based analysis tools typically enable pinpointing sources of SCA leakage and easily allow analyzing sub parts of the masked design individually.
1542+ As a result, individual issues can be fixed up faster.
1543+* The analyzed netlist is closer to the targeted ASIC implementation.
1544+ During FPGA synthesis, the netlist is mapped to the logic elements such as look-up tables (LUTs) available on the selected FPGA which are fundamentally different from more simple ASIC gates.
1545+
1546+However, formal netlist analysis tools may not be perfect and they also have limitations in terms of what can be analyzed.
1547+For example, the maximum supported netlist size depends on the complexity and number of the non-linear elements.
1548+Also, random number generators and in particular pseudo-random number generators typically need to be excluded from the analysis and random number inputs need to be assumed as ideal by tools.
1549+Thus, they don’t replace FPGA-based analysis.
1550+We use them to increase our confidence in our SCA countermeasures and to close countermeasure verification faster by reducing the number of FPGA evaluation runs.
1551+
1552+##### Prerequisites
1553+
1554+The [Alma-based formal masking verification flow together with a Howto](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_sca/alma#readme) (including installation instructions) as well an [open source Yosys synthesis flow](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_syn) are available open soure.
1555+The tool can both run on generic Yosys netlists or on proprietary and technology-specific netlists.
1556+For the latter, a [slightly modified verification flow with an additional translation step](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_sca/alma_post_syn#readme) is required.
1557+To verify the GHASH SCA countermeasure, the generic flow was used with the following tool versions:
1558+
1559+* Alma ([specific commit](https://github.com/vogelpi/coco-alma/commit/68e436f67dee7d27fb782864dc5523ceb4bd27bf))
1560+* Yosys 0.36 (git sha1 8f07a0d84)
1561+* sv2v v0.0.11-28-g81d8225
1562+* Verilator 4.214 2021-10-17 rev v4.214
1563+
1564+##### Yosys Netlist Synthesis
1565+
1566+Setup the [open source Yosys synthesis flow](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_syn) by copying the [`syn_setup.example.sh`](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_syn/syn_setup.example.sh) file and renaming it to `syn_setup.sh`.
1567+Change the `LR_SYNTH_TOP_MODULE` variable to `aes_ghash_wrap` and the `LR_SYNTH_CELL_LIBRARY_PATH` to the `NangateOpenCellLibrary_typical.lib` file in the folder where you installed the nangate45 library.
1568+
1569+Then, start the synthesis by executing
1570+
1571+```sh
1572+./syn_yosys.sh
1573+```
1574+This should produce output similar to what is shown below:
1575+
1576+```
1577+8. Printing statistics.
1578+
1579+=== aes_ghash_wrap ===
1580+
1581+ Number of wires: 24543
1582+ Number of wire bits: 29339
1583+ Number of public wires: 567
1584+ Number of public wire bits: 5363
1585+ Number of memories: 0
1586+ Number of memory bits: 0
1587+ Number of processes: 0
1588+ Number of cells: 26214
1589+ AND2_X1 1585
1590+ AND3_X1 4
1591+ AND4_X1 32
1592+ AOI211_X1 58
1593+ AOI21_X1 293
1594+ AOI221_X1 215
1595+ AOI22_X1 364
1596+ DFFR_X1 1468
1597+ DFFS_X1 5
1598+ INV_X1 584
1599+ MUX2_X1 1252
1600+ NAND2_X1 1870
1601+ NAND3_X1 128
1602+ NAND4_X1 37
1603+ NOR2_X1 7551
1604+ NOR3_X1 445
1605+ NOR4_X1 28
1606+ OAI211_X1 98
1607+ OAI21_X1 827
1608+ OAI221_X1 3
1609+ OAI22_X1 183
1610+ OR2_X1 28
1611+ OR3_X1 67
1612+ OR4_X1 2
1613+ XNOR2_X1 7122
1614+ XOR2_X1 1965
1615+
1616+ Chip area for module '\aes_ghash_wrap': 37534.728000
1617+
1618+====== End Yosys Stat Report ======
1619+
1620+Warnings: 20 unique messages, 102 total
1621+
1622+End of script. Logfile hash: 16c4d13569, CPU: user 25.11s system 0.12s, MEM: 176.29 MB peak
1623+Yosys 0.36 (git sha1 8f07a0d84, gcc 11.4.0-1ubuntu1~22.04 -fPIC -Os)
1624+Time spent: 66% 2x abc (47 sec), 9% 40x opt_expr (6 sec), ...
1625+Area in kGE = 47.04
1626+```
1627+
1628+Note that the reported area is quite a bit bigger compared to the number reported in the [GHASH SCA countermeasure specification](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#1st-order-masking-of-the-ghash-module)
1629+The reasons are twofold:
1630+
1631+1. The `aes_ghash_wrap` module synthesized is a wrapper module around the GHASH module in focus of this analysis.
1632+ The goal of the wrapper is to separately feed in secrets (the hash subkey H and the encrypted initial counter block S) as well as randomness in a tool aware manner.
1633+ As such, the wrapper includes some additional muxing resources and a counter to ease interpretation of results.
1634+2. To speed up the formal analysis, the pipelined Galois-field multipliers have been instantiated with a latency of 4 instead of 32 clock cycles as on FPGA.
1635+ While the latency or more precisely the processing parallelism does have an impact on the SNR, it does not have an impact on the formal netlist analysis which is performed in a so-to-say noise free environment.
1636+
1637+##### Formal Netlist Analysis
1638+
1639+After synthesizing the netlist, the following steps should be taken to perform the analysis:
1640+
1641+1. Make sure to source the `build_consts.sh` script
1642+ ```sh
1643+ source util/build_consts.sh
1644+ ```
1645+ in order to set up some shell variables.
1646+
1647+1. Enter the directory where you have downloaded Alma and load the virtual Python environment
1648+ ```sh
1649+ source dev/bin/activate
1650+ ```
1651+
1652+1. Launch the Alma tool to parse, trace (simulate) and formally verify the netlist.
1653+ For simplicity, a single script is provided to launch all the required steps with a single command.
1654+ Simply run
1655+ ```sh
1656+ ${REPO_TOP}/hw/ip/aes/pre_sca/alma/verify_aes_ghash.sh
1657+ ```
1658+ This should produce output similar to the one below:
1659+ ```
1660+ Verifying aes_ghash_wrap using Alma
1661+ Starting yosys synthesis...
1662+ | CircuitGraph | Total: 29882 | Linear: 9091 | Non-linear: 12741 | Registers: 1473 | Mux: 3538 |
1663+ parse.py successful (47.99s)
1664+ 1: Running verilator on given netlist
1665+ 2: Compiling verilated netlist library
1666+ 3: Compiling provided verilator testbench
1667+ 4: Simulating circuit and generating VCD
1668+ | CircuitGraph | Total: 29882 | Linear: 9091 | Non-linear: 12741 | Registers: 1473 | Mux: 3538 |
1669+ tmp/tmp.vcd:24765: [WARNING] Entry for name alert_fatal_i already exists in namemap (alert_fatal_i -> Ce")
1670+ tmp/tmp.vcd:24766: [WARNING] Entry for name alert_o already exists in namemap (alert_o -> De")
1671+ tmp/tmp.vcd:24767: [WARNING] Entry for name clear_i already exists in namemap (clear_i -> Ee")
1672+ tmp/tmp.vcd:24768: [WARNING] Entry for name clk_i already exists in namemap (clk_i -> Fe")
1673+ tmp/tmp.vcd:24770: [WARNING] Entry for name cyc_ctr_o already exists in namemap (cyc_ctr_o -> Ge")
1674+ tmp/tmp.vcd:24771: [WARNING] Entry for name data_in_prev_i already exists in namemap (data_in_prev_i -> He")
1675+ tmp/tmp.vcd:24772: [WARNING] Entry for name data_out_i already exists in namemap (data_out_i -> Le")
1676+ tmp/tmp.vcd:24773: [WARNING] Entry for name first_block_o already exists in namemap (first_block_o -> Pe")
1677+ tmp/tmp.vcd:24774: [WARNING] Entry for name gcm_phase_i already exists in namemap (gcm_phase_i -> Qe")
1678+ tmp/tmp.vcd:24775: [WARNING] Entry for name ghash_state_done_o already exists in namemap (ghash_state_done_o -> Re")
1679+ tmp/tmp.vcd:24776: [WARNING] Entry for name hash_subkey_i already exists in namemap (hash_subkey_i -> Ve")
1680+ tmp/tmp.vcd:24777: [WARNING] Entry for name in_ready_o already exists in namemap (in_ready_o -> ^e")
1681+ tmp/tmp.vcd:24778: [WARNING] Entry for name in_valid_i already exists in namemap (in_valid_i -> _e")
1682+ tmp/tmp.vcd:24779: [WARNING] Entry for name load_hash_subkey_i already exists in namemap (load_hash_subkey_i -> `e")
1683+ tmp/tmp.vcd:24780: [WARNING] Entry for name num_valid_bytes_i already exists in namemap (num_valid_bytes_i -> ae")
1684+ tmp/tmp.vcd:24781: [WARNING] Entry for name op_i already exists in namemap (op_i -> be")
1685+ tmp/tmp.vcd:24782: [WARNING] Entry for name out_ready_i already exists in namemap (out_ready_i -> ce")
1686+ tmp/tmp.vcd:24783: [WARNING] Entry for name out_valid_o already exists in namemap (out_valid_o -> de")
1687+ tmp/tmp.vcd:24784: [WARNING] Entry for name prd_i already exists in namemap (prd_i -> ee")
1688+ tmp/tmp.vcd:24785: [WARNING] Entry for name rst_ni already exists in namemap (rst_ni -> me")
1689+ tmp/tmp.vcd:24786: [WARNING] Entry for name s_i already exists in namemap (s_i -> ne")
1690+ 0
1691+ 0
1692+ Building formula for cycle 0: vars 0 clauses 0
1693+ Checking cycle 0:
1694+ Building formula for cycle 1: vars 1024 clauses 1536
1695+ Checking cycle 1:
1696+ Building formula for cycle 2: vars 3968 clauses 6528
1697+ Checking cycle 2:
1698+ Building formula for cycle 3: vars 6298 clauses 11026
1699+ Checking cycle 3:
1700+ Building formula for cycle 4: vars 14888 clauses 34886
1701+ Checking cycle 4:
1702+ Building formula for cycle 5: vars 20924 clauses 52734
1703+ Checking cycle 5:
1704+ Building formula for cycle 6: vars 53986 clauses 143674
1705+ Checking cycle 6:
1706+ Building formula for cycle 7: vars 57570 clauses 150970
1707+ Checking cycle 7:
1708+ Building formula for cycle 8: vars 80484 clauses 169282
1709+ Checking cycle 8:
1710+ Building formula for cycle 9: vars 213770 clauses 504198
1711+ Checking cycle 9:
1712+ Building formula for cycle 10: vars 594390 clauses 1617276
1713+ Checking cycle 10:
1714+ Building formula for cycle 11: vars 1024018 clauses 2881744
1715+ Checking cycle 11:
1716+ Building formula for cycle 12: vars 1704424 clauses 4910342
1717+ Checking cycle 12:
1718+ Building formula for cycle 13: vars 1713897 clauses 4915466
1719+ Checking cycle 13:
1720+ Building formula for cycle 14: vars 1834911 clauses 5233038
1721+ Checking cycle 14:
1722+ Building formula for cycle 15: vars 2258841 clauses 6492446
1723+ Checking cycle 15:
1724+ Building formula for cycle 16: vars 2734646 clauses 7907830
1725+ Checking cycle 16:
1726+ Building formula for cycle 17: vars 5868600 clauses 18374416
1727+ Checking cycle 17:
1728+ Building formula for cycle 18: vars 5922747 clauses 18524578
1729+ Checking cycle 18:
1730+ Building formula for cycle 19: vars 6100898 clauses 19061808
1731+ Checking cycle 19:
1732+ Building formula for cycle 20: vars 6427297 clauses 20074334
1733+ Checking cycle 20:
1734+ Building formula for cycle 21: vars 6949506 clauses 21693947
1735+ Checking cycle 21:
1736+ Building formula for cycle 22: vars 6949506 clauses 21693947
1737+ Checking cycle 22:
1738+ Building formula for cycle 23: vars 6949506 clauses 21693947
1739+ Checking cycle 23:
1740+ Building formula for cycle 24: vars 7057992 clauses 21994175
1741+ Checking cycle 24:
1742+ Building formula for cycle 25: vars 7407412 clauses 23047989
1743+ Checking cycle 25:
1744+ Building formula for cycle 26: vars 7797810 clauses 24221073
1745+ Checking cycle 26:
1746+ Building formula for cycle 27: vars 10939700 clauses 34732235
1747+ Checking cycle 27:
1748+ Building formula for cycle 28: vars 11268148 clauses 35780811
1749+ Checking cycle 28:
1750+ Building formula for cycle 29: vars 11268148 clauses 35780811
1751+ Checking cycle 29:
1752+ Building formula for cycle 30: vars 11268148 clauses 35780811
1753+ Checking cycle 30:
1754+ Building formula for cycle 31: vars 11376634 clauses 36081039
1755+ Checking cycle 31:
1756+ Building formula for cycle 32: vars 11726054 clauses 37134853
1757+ Checking cycle 32:
1758+ Building formula for cycle 33: vars 12116452 clauses 38307937
1759+ Checking cycle 33:
1760+ Building formula for cycle 34: vars 15258342 clauses 48819099
1761+ Checking cycle 34:
1762+ Building formula for cycle 35: vars 15586534 clauses 49867675
1763+ Checking cycle 35:
1764+ Building formula for cycle 36: vars 15619430 clauses 49965979
1765+ Checking cycle 36:
1766+ Finished in 3948.52
1767+ The execution is secure
1768+ ```
1769+
1770+Notes:
1771+
1772+* This analysis exercises the full data path of the GHASH block and comprises the following operations (controlled by a small [Verilator testbench](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/alma/cpp/verilator_tb_aes_ghash_wrap.cpp)):
1773+ + Initial clearing of all internal registers.
1774+ + Loading the hash subkey H.
1775+ + Loading the encrypted initial counter block S including the subsequent generation of repeatedly used correction terms.
1776+ + Processing a first AAD/ciphertext block including the generation of a correction term that is used for the first block only.
1777+ + Processing a second AAD/ciphertext block.
1778+ + Producing the final authentication tag.
1779+
1780+* The [following main changes have been implemented as a result of the formal netlist analysis using Alma](https://github.com/vogelpi/opentitan/commit/ac9333116cbe65fa6b868fe02cb17344d1e2717f) (refer to the [countermeasure spec](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#mapping-the-masked-algorithm-to-the-hardware) for details):
1781+ + The result of the final addition of Share 1 of S and the unmasked GHASH state is no longer stored into the GHASH state register but directly forwarded to the output, and the state input to this addition is blanked.
1782+ The input multiplexer (`ghash_in_mux`) loses one input.
1783+ + The two 3-input multiplexers selecting the operands for the addition with the GHASH state (`add_in_mux`) are replaced by one-hot multiplexers with registered control signals.
1784+ + The Operand B inputs of both GF multipliers are now blanked.
1785+ The 3-input multiplexer selecting Operand B of the second GF multiplier is replaced by a one-hot multiplexer with registered control signal.
1786+ In addition, the last input slice of Operand B for this multiplier is registered.
1787+ This allows the switching the multiplexer during the last clock cycle of the multiplication to avoid some undesirable transient leakage occurring upon saving the result of the multiplication into the GHASH state register (and this new value propagating through the multiplexer into the multiplier again).
1788+ + The GF multipliers are configured to output zero instead of Operand A (the hash subkey) while busy.
1789+ + The state input for the addition required for the generation of the correction term for Share 0 is blanked.
1790+ + Between adding the correction terms to the GHASH state for the last time and between unmasking the GHASH state, a bubble cycle is added to allow signals to fully settle thereby avoiding undesirable transient effects unmasking the uncorrected state shares.
1791+* The overall area impact of these changes is low (+0.16 kGE in Yosys + nangate45).
1792+* The final design successfully passes the formal masking verification.
1793+ For details regarding tool parameters, check the [analysis script](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/alma/verify_aes_ghash.sh).
1794+
1795+#### ChipWhisperer-based FPGA evaluation and TVLA
1796+
1797+To underpin the results of the formal verification flow, the hardening of the GHASH module has been analyzed on the ChipWhisperer [CW310](https://rtfm.newae.com/Targets/CW310%20Bergen%20Board/) FPGA board.
1798+For this analysis, power traces with the ChipWhisperer [Husky](https://rtfm.newae.com/Capture/ChipWhisperer-Husky/) scope were captured during GCM operations.
1799+Afterwards a Test Vector Leakage Assessment (TVLA) with the [ot-sca toolset](https://github.com/lowRISC/ot-sca) has been performed.
1800+The setup is illustrated in Figure 1.
1801+
1802+![](../images/caliptra-rtl/docs/images/cw310_cwhusky.jpeg)
1803+:--:
1804+**Figure 1**: Target CW310 FPGA board (left) and the CW Husky scope (right).
1805+
1806+##### Setup
1807+
1808+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure2.png)
1809+:--:
1810+**Figure 2**: Measurement setup. The main components are the target board, the scope, and the SCA framework.
1811+
1812+Figure 2 gives a detailed overview of the measurement setup that has been utilized to capture the power traces.
1813+The SCA evaluation framework ot-sca is the central component of the measurement setup.
1814+It is responsible for communicating with the penetration testing framework that runs on the target FPGA board and with the scope.
1815+Initially, ot-sca configures the scope (sample rate, number of samples) and the pentest framework (which input, how many encryptions, where to trigger).
1816+
1817+Based on the configuration, the pentest framework generates the cipher input, starts the encryption, and sends back the computed tag to ot-sca.
1818+The trigger is automatically set and unset by the AES hardware block to achieve an accurate & constant trigger window.
1819+In parallel, the scope waits for the trigger, captures the power consumption, and transfers the traces to the SCA evaluation framework.
1820+The ot-sca framework stores the trace as well as the cipher configuration in a database.
1821+
1822+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure3.png)
1823+:--:
1824+**Figure 3**: Power trace with AES encryption rounds visible (*left*). Aligned traces when zooming in (*right*).
1825+
1826+Figure 3 depicts power traces captured during AES-GCM encryptions with the setup above.
1827+As shown in the figure, the traces are nicely aligned, allowing to perform a sound evaluation.
1828+
1829+##### Methodology
1830+
1831+To detect whether the hardened GHASH implementation effectively mitigates SCA attacks, the Test Vector Leakage Assessment (TVLA) approach discussed by Rambus in a [whitepaper](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf) is adapted for the GCM mode of AES.
1832+In TVLA, Welch’s *t*-test is used to determine whether it is possible to statistically distinguish two power trace sets from each other.
1833+This test returns a value *t* for each sample, where a value of |*t*| > 4.5 means that, with a high probability, a data dependent leakage was detected.
1834+However, note that this test cannot provide any information whether the leakage is actually exploitable.
1835+
1836+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure4.png)
1837+:--
1838+**Figure 4:** TVLA plot showing leakage at around sample 1000. When increasing the number of traces (from 1000 to 10000), the leakage becomes more present. Note that the traces shown in this plot are taken from an arbitrary cryptographic hardware block and not AES.
1839+
1840+Figure 4 shows a TVLA plot that will be used throughout this document. The red lines mark the ± *t*-test border.
1841+
1842+###### Dataset Generation for FvsR IV & Key
1843+
1844+In TVLA, two different trace data sets need to be recorded.
1845+As described in the [whitepaper](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf), we generate these two trace data sets by using a fixed and a random AES-GCM cipher input set, *i.e.,* the fixed and the random set.
1846+
1847+| **Input** | **Fixed Set** | **Random Set** |
1848+| --- | --- | --- |
1849+| **Key** | STATIC | RANDOM |
1850+| **IV** | STATIC | RANDOM |
1851+| **PTX** | STATIC | STATIC |
1852+| **AAD** | STATIC | STATIC |
1853+
1854+
1855+As shown in the table above, for our experiment we use a static cipher input for the fixed set.
1856+For the random set, we use a PRNG to randomly generate the secrets, *i.e.,* key and IV, for each encryption.
1857+The dataset is generated directly on the device in the pentest framework.
1858+For each trace, ot-sca stores information to which dataset the trace belongs to.
1859+
1860+With TVLA, the idea is to check whether we are able to distinguish power traces from the fixed and the random set.
1861+
1862+###### Dataset Generation for FvsR PTX & AAD
1863+
1864+For the second experiment, we use a static IV and key and calculate a FvsR PTX and AAD set:
1865+
1866+| **Input** | **Fixed Set** | **Random Set** |
1867+| --- | --- | --- |
1868+| **Key** | STATIC | STATIC |
1869+| **IV** | STATIC | STATIC |
1870+| **PTX** | STATIC | RANDOM |
1871+| **AAD** | STATIC | RANDOM |
1872+
1873+
1874+##### Results – FvsR IV & Key
1875+
1876+In the following, we discuss the analysis results for each GCM phase.
1877+We start with the results for the FvsR IV & Key datasets.
1878+
1879+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure5.png)
1880+:--:
1881+**Figure 5:** AES-GCM block diagram. Red lines mark the trigger windows for each analysis step.
1882+
1883+As shown in Figure 5, we focus on analyzing (*i*) the generation of the hash subkey H, (*ii*) the encryption of the initial counter block S, (*iii*) the processing of the AAD blocks, (*iv*) the plaintext blocks, and (*v*) the tag generation. Each measurement is conducted with (*a*) masks off and (*b*) masks on to analyze the effectiveness of the masking countermeasure.
1884+
1885+###### i) SCA Evaluation of Generating the Hash Subkey H
1886+
1887+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure6ab.png)
1888+:--:
1889+
1890+| **Figure 6a:** Masking Off - 100k traces - **Figure 6b:** Masking On - 1M traces |
1891+
1892+
1893+###### Interpretation
1894+
1895+The AES encryption is clearly visible in the form of 12 distinct peaks in the power traces shown Figures 6a and 6b.
1896+The 12 peaks correspond to first the loading of the key and the all-zero block into the AES cipher core, followed by the initial round and the 10 full AES rounds (AES-128).
1897+They spread over approximately 470 samples which corresponds to the 56 target clock cycles a full AES-128 encryption takes.
1898+
1899+If the masking is turned off (Figure 6a), first and second-order leakage is clearly visible throughout the operation.
1900+If the masking is on (Figure 6b), there is first-order leakage 1) at the beginning as well as 2) at the end of the operation.
1901+
1902+1. The leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
1903+ This produces first-order leakage as the inc32 function implementation isn’t masked.
1904+ It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM.
1905+2. The leakage at the end of the operation happens when the masked output of the AES cipher core, i.e., the masked hash subkey H, gets loaded in shares into the GHASH block.
1906+ When studying the RTL, one can see that there is nothing in the path between the AES cipher core and the hash subkey registers inside the GHASH block that could combine the shares and cause this leakage.
1907+ The leakage is most likely due to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices: if flip flops of the different shares get mapped to the same logic slice, the carry-chain and other muxing logic present in the logic slice can combine the various inputs thereby causing SCA leakage despite these logic outputs not being used.
1908+ We’ve observed similar effects in the past and there is [research giving more insight into this and other FPGA-specific issues](https://ieeexplore.ieee.org/document/10545383).
1909+
1910+To summarize, the observed first-order leakage if masking is on (Figure 6b) is not of concern for ASIC implementations.
1911+
1912+###### ii) SCA Evaluation of Encrypting the Initial Counter Block
1913+
1914+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure7ab.png)
1915+:--:
1916+
1917+| **Figure 7a:** Masking Off - 100k traces - **Figure 7b:** Masking On - 1M traces |
1918+
1919+
1920+###### Interpretation
1921+
1922+Again, the AES encryption is clearly visible in the form of 12 peaks in the power traces shown Figures 7a and 7b.
1923+This AES encryption corresponds to the generation of the encrypted initial counter block S.
1924+The AES encryption is followed by another operation visible in the power trace: the computation of repeatedly used correction terms using the Galois-field multipliers inside GHASH.
1925+This operation takes 33 target clock cycles (approximately 275 samples).
1926+
1927+If the masking is turned off (Figure 7a), first and second-order leakage is clearly visible throughout both operations while being more pronounced during the GHASH operation.
1928+This is because the GHASH block is smaller and thus produces less noise.
1929+If the masking is on (Figure 7b), there is first-order leakage 1) at the beginning as well as 2) between the two operations.
1930+
1931+1. As before, the leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
1932+ This produces first-order leakage as the inc32 function implementation isn’t masked.
1933+ It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM.
1934+2. As before, the leakage at the end of the operation happens when the masked output of the AES cipher core, i.e., the encrypted initial counter block gets loaded in shares into the GHASH block.
1935+ When studying the RTL, one can see that there is nothing in the path between the AES cipher core and the GHASH state registers inside the GHASH block that could combine the shares and cause this leakage.
1936+ As before, the leakage is most likely due to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices: Since the multiplexers for both shares use the same control signals, the multiplexing logic can be combined even into the same look-up tables (LUTs) thereby causing SCA leakage.
1937+ We’ve observed similar effects in the past and there is [research giving more insight into this and other FPGA-specific issues](https://ieeexplore.ieee.org/document/10545383).
1938+
1939+To summarize, the observed first-order leakage if masking is on (FIgure 7b) is not of concern for ASIC implementations.
1940+
1941+###### iii) SCA Evaluation of Processing the AAD Blocks
1942+
1943+###### Processing AAD Block 0
1944+
1945+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure8ab.png)
1946+:--:
1947+
1948+| **Figure 8a:** Masking Off - 50k traces - **Figure 8b:** Masking On - 10M traces |
1949+
1950+
1951+###### Interpretation
1952+
1953+For AAD blocks, the AES cipher core is not involved.
1954+However, during the computation of the first AAD block, the GHASH block needs to compute an additional correction term which is used for the very first block only.
1955+If the masking is turned off (Figure 8a), first- and second-order leakage is clearly visible but only for the first activity block.
1956+The second activity block involves computing the additional correction terms which requires Share 1 of the encrypted initial counter block to be multiplied by Share 1 of the hash subkey.
1957+But since the masking is off, both these values are zero for both the fixed and the random set and hence there is no SCA leakage.
1958+If the masking is turned on (Figure 8b), no SCA leakage is observable which is desirable.
1959+
1960+###### Processing AAD Block 1
1961+
1962+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure9ab.png)
1963+:--:
1964+
1965+| **Figure 9a:** Masking Off - 50k traces - **Figure 9b:** Masking On - 10M traces |
1966+
1967+
1968+###### Interpretation
1969+
1970+For the second AAD block (and any subsequent AAD blocks) there is only one activity block corresponding to the Galois-field multiplication.
1971+If masking is turned off (Figure 9a), there is both first- and second-order leakage observable.
1972+If the masking is turned on (Figure 9b), no SCA leakage is observable which is desirable.
1973+
1974+###### iv) SCA Evaluation of Processing the PTX Blocks
1975+
1976+###### Processing PTX Block 0
1977+
1978+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure10ab.png)
1979+:--:
1980+
1981+| **Figure 10a:** Masking Off - 50k traces - **Figure 10b:** Masking On - 1M traces |
1982+
1983+
1984+###### Interpretation
1985+
1986+Like in [ii) SCA Evaluation of Encrypting the Initial Counter Block](#ii-sca-evaluation-of-encrypting-the-initial-counter-block) there is first-order leakage 1) at the beginning and 2) between the two operations if the masking is turned on (Figure 10b).
1987+
1988+1. As before, the leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds.
1989+ This produces first-order leakage as the inc32 function implementation isn’t masked.
1990+ It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM.
1991+2. The leakage between the two operations is due to the unmasking of the AES cipher core output, the addition of input data to produce the ciphertext, and writing this value to the GHASH block and the output data registers.
1992+ It’s not related to the hash subkey H or the initial counter block S (i.e. the two secrets involved in the GHASH part of GCM).
1993+ But since the AAD and the plaintext have been chosen to be the same for all traces in the fixed and the random sets, the traces of the fixed set only produce all the same ciphertext and thus are expected to exhibit a static power signature for this step, whereas the ciphertext of the random set is randomized through the random key and IV.
1994+ However, since the ciphertext is not secret in the context of GCM, this leakage is of no concern.
1995+
1996+To summarize, the observed first-order leakage if masking is on (FIgure 10b) is not of concern.
1997+
1998+###### Processing PTX Block 1
1999+
2000+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure11ab.png)
2001+:--:
2002+
2003+| **Figure 11a:** Masking Off - 50k traces - **Figure 11b:** Masking On - 1M traces |
2004+
2005+
2006+###### Interpretation
2007+
2008+As before (PTX Block 0), there is some first-order leakage observable when the masking is turned on.
2009+For the same reasons as before, this leakage is not of concern.
2010+
2011+###### v) SCA Evaluation of the Tag Generation
2012+
2013+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure12ab.png)
2014+:--:
2015+
2016+| **Figure 12a:** Masking Off - 50k traces - **Figure 12b:** Masking On - 1M traces |
2017+
2018+
2019+###### Interpretation
2020+
2021+The generation of the final authentication tag consists of two operations.
2022+1) The 128-bit block containing the AAD and ciphertext lengths is hashed and the correction terms are added.
2023+ The GHASH state is unmasked (still masked with the encrypted initial counter block S) and Share 1 of S is added to write the final authentication tag to the data output registers readable by software.
2024+2) In parallel to writing the final authentication tag to the data output registers, the internal state is all cleared to random values and an additional multiplication is triggered to clear the internal state of the Galois-field multipliers and the correction term registers.
2025+
2026+If masking is turned off (Figure 12a), there is both first- and second-order leakage observable during the first activity block (tag generation) but not during the clearing operation.
2027+If the masking is turned on (Figure 12b), some SCA leakage is observable between the two operations, i.e., when the final authentication tag is written to the output data registers.
2028+This leakage is expected as both the fixed and the random data sets use a static AAD and plaintext.
2029+This means, the tag for the fixed data set is fixed whereas the tags for the random set get randomized through the ciphertext (random due to the random key and IV).
2030+
2031+To summarize, the observed first-order leakage if masking is on (FIgure 12b) is not of concern.
2032+
2033+##### Results – FvsR PTX & AAD
2034+
2035+In the following, we discuss the analysis results for each FvsR PTX & AAD datasets.
2036+These experiments were specifically done to investigate leakage peaks identified for the FvsR Key & IV datasets that are attributed to how the FPGA implementation tool maps flip flops and multiplexer shares to the available FPGA logic slices.
2037+
2038+###### i) SCA Evaluation of Generating the Hash Subkey H
2039+
2040+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure13ab.png)
2041+:--:
2042+
2043+| **Figure 13a:** Masking Off - 50k traces - **Figure 13b:** Masking On - 1M traces |
2044+
2045+
2046+###### Interpretation
2047+
2048+There is no SCA leakage visible in both cases without masking (Figure 13a) and with masking turned on (Figure 13b).
2049+This is expected as the hash subkey generation doesn’t involve the plaintext and the AAD but only the key and IV.
2050+Both the fixed and random set use the same static key and IV.
2051+
2052+This experiment was specifically done to check whether the leakage identified in Figure 6b and attributed to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices.
2053+As expected, the leakage peak is now gone.
2054+
2055+###### ii) SCA Evaluation of Encrypting the Initial Counter Block
2056+
2057+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure14ab.png)
2058+:--:
2059+
2060+| **Figure 14a:** Masking Off - 50k traces - **Figure 14b:** Masking On - 1M traces |
2061+
2062+
2063+###### Interpretation
2064+
2065+There is no SCA leakage visible in both cases without masking (Figure 14a) and with masking turned on (Figure 14b).
2066+This is expected as the encryption of the initial counter block and the subsequent computation of repeatedly used correction terms doesn’t involve the plaintext and the AAD but only the key and IV.
2067+Both the fixed and random set use the same static key and IV.
2068+
2069+This experiment was specifically done to check whether the leakage identified in Figure 7b and attributed to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices.
2070+As expected, the leakage peak is now gone.
2071+
2072+###### iv) SCA Evaluation of Processing the PTX Block 0
2073+
2074+![](../images/caliptra-rtl/docs/images/GHASH_TVLA_Figure15ab.png)
2075+:--:
2076+
2077+| **Figure 15a:** Masking Off - 100k traces - **Figure 15b:** Masking On - 1M traces |
2078+
2079+
2080+###### Interpretation
2081+
2082+With the masking turned off (Figure 15a), there is first-order leakage 1) at the beginning of the operation and 2) throughout the entire GHASH operation.
2083+
2084+1. The leakage at the beginning of the operation is due to the input data (the plaintext) being written to an internal buffer register.
2085+ The AES cipher is operated in counter mode, meaning it doesn’t encrypt the input data but the counter value (incremented IV).
2086+ Because the IV is fixed for both the fixed and the random data set, no leakage is observed during the AES encryption even if the masking is off.
2087+ At the end of the AES encryption, the output of the AES cipher core is added to the content of the buffer register to produce the ciphertext which is then forwarded to the GHASH block and to the data output registers.
2088+2. The GHASH operation then processes this ciphertext.
2089+ The observed leakage when the masking is off is expected.
2090+
2091+With the masking turned on (Figure 15b), the first-order leakage at the beginning of the operation remains visible. The reason for this is that the internal register buffering the previous input data is not masked.
2092+This is of no concern as the leakage is not related to key or IV.
2093+
2094+Another first-order leakage peak is visible between the AES encryption and the GHASH operation.
2095+This leakage is due to the unmasked AES cipher core output being added to the input data (coming from the internal buffer register) and the result being stored to the output data register.
2096+As key and IV are static and identical for both the fixed and the random data set, the cipher core output is the same for both sets.
2097+Any difference in the power signature between the two sets is due to the different plaintext / ciphertext.
2098+Again, this is to be expected and of no concern as the ciphertext is not secret in the context of GCM.
2099+
2100+#### Reproducing the FPGA Experiments
2101+
2102+##### Prerequisites
2103+
2104+###### (i) Setting up the CW310 and CW Husky
2105+
2106+Please follow the guide [here](https://github.com/lowRISC/ot-sca/blob/master/doc/getting_started.md#cw310) to prepare the CW310 and CW Husky for the SCA measurements.
2107+
2108+###### (ii) Generating the FPGA Bitstream
2109+
2110+Follow the guide [here](https://opentitan.org/book/doc/getting_started/install_vivado/index.html) to install Xilinx Vivado. Please note that a valid license is needed to generate bitstreams for the CW310 FPGA board.
2111+
2112+Then, build the bitstream from the [aes-gcm-sca-bitstream](https://github.com/vogelpi/opentitan/tree/aes-gcm-sca-bitstream) branch.
2113+This branch includes the AES-GCM and applies several optimizations (disabling certain features to reduce the area utilization) to improve the SCA measurements.
2114+```sh
2115+git clone https://github.com/vogelpi/opentitan.git
2116+cd opentitan
2117+git checkout aes-gcm-sca-bitstream
2118+./bazelisk.sh build //hw/bitstream/vivado:fpga_cw310_test_rom
2119+cp bazel-bin/hw/bitstream/vivado/build.fpga_cw310/synth-vivado/lowrisc_systems_chip_earlgrey_cw310_0.1.bit .
2120+```
2121+
2122+The resulting bitstream is `lowrisc_systems_chip_earlgrey_cw310_0.1.bit`.
2123+
2124+###### (iii) Compiling the Penetration Testing Binary
2125+
2126+The penetration testing binary that is running on the target is the framework that receives commands from the side-channel evaluation framework and triggers the AES-GCM operations.
2127+```sh
2128+git clone <https://github.com/vogelpi/opentitan.git>
2129+cd opentitan
2130+git checkout aes-gcm-review
2131+./bazelisk.sh build //sw/device/tests/penetrationtests/firmware:firmware_fpga_cw310_test_rom
2132+cp bazel-bin/sw/device/tests/penetrationtests/firmware/firmware_fpga_cw310_test_rom_fpga_cw310_test_rom.bin sca_ujson_fpga_cw310.bin
2133+```
2134+
2135+The resulting penetration testing binary is `sca_ujson_fpga_cw310.bin`.
2136+
2137+###### (iv) Setting up the Side-Channel Evaluation Framework
2138+
2139+Clone the ot-sca repository and switch to the dedicated AES-GCM branch:
2140+```sh
2141+git clone <https://github.com/lowRISC/ot-sca.git>
2142+cd ot-sca
2143+git checkout ot-sca-aes-gcm
2144+```
2145+
2146+Then, follow [this](https://github.com/lowRISC/ot-sca/blob/master/doc/getting_started.md#installing-on-a-machine) guideline to prepare your machine for the measurements.
2147+
2148+Afterwards, copy the bitstream to `ot-sca/objs/lowrisc_systems_chip_earlgrey_cw310_0.1.bit` and the binary to `ot-sca/objs/sca_ujson_fpga_cw310.bin`.
2149+
2150+Finally, determine the port the CW310 opened on your machine (e.g., `/dev/ttyACM2`) and set it accordingly in the `port` field of the `ot-sca/capture/configs/aes_gcm_sca_cw310.yaml` configuration file.
2151+
2152+##### Capturing Traces
2153+
2154+After fulfilling the prerequisites, traces can be captured using ot-sca.
2155+To configure the measurement, adapt the script located in `ot-sca/capture/configs/aes_gcm_sca_cw310.yaml`.
2156+The following parameters can be changed:
2157+```yml
2158+husky:
2159+ # Number of encryptions performed in one batch.
2160+ num_segments: 35
2161+ # Number of cycles that are captured by the CW Husky.
2162+ num_cycles: 320
2163+capture:
2164+ # Number of traces to capture.
2165+ num_traces: 100000
2166+ # Number of traces to keep in memory before flushing to the disk.
2167+ trace_threshold: 50000
2168+test:
2169+ # Values used for the fixed set.
2170+ iv_fixed: [0xDE, 0xAD, 0xBE, 0xEF, 0xCA, 0xFE, 0xBA, 0xAD, 0xF0, 0xCA,
2171+ 0xCC, 0x1A, 0x00, 0x00, 0x00, 0x00]
2172+ key_fixed: [0x81, 0x1E, 0x37, 0x31, 0xB0, 0x12, 0x0A, 0x78, 0x42, 0x78,
2173+ 0x1E, 0x22, 0xB2, 0x5C, 0xDD, 0xF9]
2174+ # Static values that are used by the fixed and the random set.
2175+ ptx_blocks: 2
2176+ ptx_static: [[0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA,
2177+ 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA], [0xBB, 0xBB, 0xBB,
2178+ 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
2179+ 0xBB, 0xBB, 0xBB]]
2180+ ptx_last_block_len_bytes: 16
2181+ aad_blocks: 2
2182+ aad_static: [[0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC,
2183+ 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC], [0xDD, 0xDD, 0xDD,
2184+ 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD,
2185+ 0xDD, 0xDD, 0xDD, 0xDD]]
2186+ aad_last_block_len_bytes: 16
2187+ # Trigger configuration (select only one).
2188+ # [Hash sub key, Init. block, AAD block, PTX block, TAG block]
2189+ triggers: [False, False, False, False, True]
2190+ # Which AAD or PTX block. 0 = first block.
2191+ trigger_block: 0
2192+ # 32-bit seed for masking on device. To switch off the masking, use 0
2193+ # as an LFSR seed.
2194+ lfsr_seed: 0x00000000
2195+ #lfsr_seed: 0xdeadbeef
2196+```
2197+
2198+After tweaking the configuration, the traces can be captured by executing:
2199+
2200+```sh
2201+cd capture
2202+./capture_aes_gcm.py -c configs/aes_gcm_sca_cw310.yaml -p aes_gcm_sca
2203+```
2204+
2205+Where the `-c` parameter is the config and `-p` the database where the traces are stored.
2206+
2207+##### Performing the TVLA
2208+
2209+After capturing the traces, the TVLA can be performed by switching into the `ot-sca/analysis` folder, copying the `ot-sca/analysis/configs/tvla_cfg_kmac.yaml` file to `ot-sca/analysis/configs/tvla_cfg_aes_gcm.yaml`, and modifying the configuration file:
2210+```yml
2211+project_file: ../capture/projects/aes_gcm_sca
2212+trace_file: null
2213+trace_start: null
2214+trace_end: null
2215+leakage_file: null
2216+save_to_disk: null
2217+save_to_disk_ttest: null
2218+round_select: null
2219+byte_select: null
2220+input_histogram_file: null
2221+output_histogram_file: null
2222+number_of_steps: 1
2223+ttest_step_file: null
2224+plot_figures: true
2225+test_type: "GENERAL_KEY"
2226+mode: aes
2227+filter_traces: true
2228+trace_threshold: 50000
2229+trace_db: ot_trace_library
2230+```
2231+
2232+By calling
2233+```sh
2234+./tvla.py --cfg-file tvla_cfg_aes_gcm.yaml run-tvla
2235+```
2236+the TVLA plot is generated.
2237+
14592238 ## PCR vault
14602239
14612240 * Platform Configuration Register (PCR) vault is a register file that stores measurements to be used by the microcontroller.
1462-* PCR entries are read-only registers of 384 bits each.
2241+* PCR entries are read-only registers of 512 bits each.
14632242 * Control bits allow for entries to be cleared by FW, which sets their values back to 0.
14642243 * A lock bit can be set by FW to prevent the entry from being cleared. The lock bit is sticky and only resets on a powergood cycle.
14652244
@@ -1490,23 +2269,23 @@
14902269
14912270 ## Key vault
14922271
1493-Key Vault (KV) is a register file that stores the keys to be used by the microcontroller, but this register file is not observed by the microcontroller. Each cryptographic function has a control register and functional block designed to read from and write to the KV. 
2272+Key Vault (KV) is a register file that stores the keys to be used by the microcontroller, but this register file is not observed by the microcontroller. Each cryptographic function has a control register and functional block designed to read from and write to the KV. 
14942273
14952274 | KV register | Description |
14962275 | :-------------------------------- | :-------------------------------------------------------- |
1497-| Key Control\[31:0\] | 32 Control registers, 32 bits each |
1498-| Key Entry\[31:0\]\[11:0\]\[31:0\] | 32 Key entries, 384 bits each <br>No read or write access |
2276+| Key Control\[23:0\] | 24 Control registers, 32 bits each |
2277+| Key Entry\[23:0\]\[15:0\]\[31:0\] | 24 Key entries, 512 bits each <br>No read or write access |
14992278
15002279
15012280 ### Key vault functional block
15022281
1503-Keys and measurements are stored in 512b register files. These have no read or write path from the microcontroller. The entries are read through a passive read mux driven by each cryptographic block. Locked entries return zeroes. 
1504-
1505-Entries in the KV must be cleared via control register, or by de-assertion of pwrgood.  
1506-
1507-Each entry has a control register that is writable by the microcontroller. 
1508-
1509-The destination valid field is programmed by FW in the cryptographic block generating the key, and it is passed here at generation time. This field cannot be modified after the key is generated and stored in the KV. 
2282+Keys and measurements are stored in 512b register files. These have no read or write path from the microcontroller. The entries are read through a passive read mux driven by each cryptographic block. Locked entries return zeroes. 
2283+
2284+Entries in the KV must be cleared via control register, or by de-assertion of pwrgood.  
2285+
2286+Each entry has a control register that is writable by the microcontroller. 
2287+
2288+The destination valid field is programmed by FW in the cryptographic block generating the key, and it is passed here at generation time. This field cannot be modified after the key is generated and stored in the KV. 
15102289
15112290 | KV Entry Ctrl Fields | Reset | Description |
15122291 | --------------------------- | ------------------- | ------------------------ |
@@ -1515,11 +2294,11 @@
15152294 | Clear\[2\] | cptra_rst_b | If unlocked, setting the clear bit causes KV to clear the associated entry. The clear bit is reset after entry is cleared. |
15162295 | Copy\[3\] | cptra_rst_b | ENHANCEMENT: Setting the copy bit causes KV to copy the key to the entry written to Copy Dest field. |
15172296 | Copy Dest\[8:4\] | cptra_rst_b | ENHANCEMENT: Destination entry for the copy function. |
1518-| Dest_valid\[16:9\] | hard_reset_b | KV entry can be used with the associated cryptographic block if the appropriate index is set. <br>\[0\] - HMAC KEY <br>\[1\] - HMAC BLOCK <br>\[2\] - SHA BLOCK <br>\[2\] - ECC PRIVKEY <br>\[3\] - ECC SEED <br>\[7:5\] - RSVD |
2297+| Dest_valid\[16:9\] | hard_reset_b | KV entry can be used with the associated cryptographic block if the appropriate index is set. <br>\[0\] - HMAC KEY <br>\[1\] - HMAC BLOCK <br>\[2\] - MLDSA SEED <br>\[3\] - ECC PRIVKEY <br>\[4\] - ECC SEED <br>\[5\] - AES KEY <br>\[7:6\] - RSVD |
15192298 | last_dword\[20:19\] | hard_reset_b | Store the offset of the last valid dword, used to indicate the last cycle for read operations. |
15202299
15212300
1522-### Key vault cryptographic functional block 
2301+### Key vault cryptographic functional block 
15232302
15242303 A generic block is instantiated in each cryptographic block to enable access to KV. 
15252304
@@ -1551,10 +2330,11 @@
15512330 | write_entry\[5:1\] | Key vault entry to store the result. |
15522331 | hmac_key_dest_valid\[6\] | HMAC KEY is a valid destination. |
15532332 | hmac_block_dest_valid\[7\] | HMAC BLOCK is a valid destination. |
1554-| sha_block_dest_valid\[8\] | SHA BLOCK is a valid destination. |
2333+| mldsa_seed_dest_valid\[8\] | MLDSA SEED is a valid destination. |
15552334 | ecc_pkey_dest_valid\[9\] | ECC PKEY is a valid destination. |
15562335 | ecc_seed_dest_valid\[10\] | ECC SEED is a valid destination. |
1557-| rsvd\[31:11\] | Reserved field |
2336+| aes_key_dest_valid\[11\] | AES KEY is a valid destination. |
2337+| rsvd\[31:12\] | Reserved field |
15582338
15592339
15602340 | KV Status Reg | Description |
@@ -1583,12 +2363,12 @@
15832363
15842364 ### Key vault de-obfuscation block operation
15852365
1586-A de-obfuscation engine (DOE) is used in conjunction with AES cryptography to de-obfuscate the UDS and field entropy.  
1587-
1588-1. The obfuscation key is driven to the AES key. The data to be decrypted (either obfuscated UDS or obfuscated field entropy) is fed into the AES data. 
1589-2. An FSM manually drives the AES engine and writes the decrypted data back to the key vault. 
1590-3. FW programs the DOE with the requested function (UDS or field entropy de-obfuscation), and the destination for the result. 
1591-4. After de-obfuscation is complete, FW can clear out the UDS and field entropy values from any flops until cptra\_pwrgood de-assertion.  
2366+A de-obfuscation engine (DOE) is used in conjunction with AES cryptography to de-obfuscate the UDS and field entropy.  
2367+
2368+1. The obfuscation key is driven to the AES key. The data to be decrypted (either obfuscated UDS or obfuscated field entropy) is fed into the AES data. 
2369+2. An FSM manually drives the AES engine and writes the decrypted data back to the key vault. 
2370+3. FW programs the DOE with the requested function (UDS or field entropy de-obfuscation), and the destination for the result. 
2371+4. After de-obfuscation is complete, FW can clear out the UDS and field entropy values from any flops until cptra\_pwrgood de-assertion.  
15922372
15932373 The following tables describe DOE register and control fields.
15942374
@@ -1605,13 +2385,13 @@
16052385 | DEST\[4:2\] | Cptra_rst_b | Destination register for the result of the de-obfuscation flow. Field entropy writes into DEST and DEST+1 <br>Key entry only, can’t go to PCR . |
16062386
16072387
1608-### Key vault de-obfuscation flow 
1609-
1610-1. ROM loads IV into DOE. ROM writes to the DOE control register the destination for the de-obfuscated result and sets the appropriate bit to run UDS and/or the field entropy flow. 
1611-2. DOE state machine takes over and loads the Caliptra obfuscation key into the key register. 
1612-3. Next, either the obfuscated UDS or field entropy are loaded into the block register 4 DWORDS at a time. 
1613-4. Results are written to the KV entry specified in the DEST field of the DOE control register. 
1614-5. State machine resets the appropriate RUN bit when the de-obfuscated key is written to KV. FW can poll this register to know when the flow is complete.
2388+### Key vault de-obfuscation flow 
2389+
2390+1. ROM loads IV into DOE. ROM writes to the DOE control register the destination for the de-obfuscated result and sets the appropriate bit to run UDS and/or the field entropy flow. 
2391+2. DOE state machine takes over and loads the Caliptra obfuscation key into the key register. 
2392+3. Next, either the obfuscated UDS or field entropy are loaded into the block register 4 DWORDS at a time. 
2393+4. Results are written to the KV entry specified in the DEST field of the DOE control register. 
2394+5. State machine resets the appropriate RUN bit when the de-obfuscated key is written to KV. FW can poll this register to know when the flow is complete.
16152395 6. The clear obf secrets command flushes the obfuscation key, the obfuscated UDS, and the field entropy from the internal flops. This should be done by ROM after both de-obfuscation flows are complete.
16162396
16172397 ## Data vault
@@ -1626,7 +2406,7 @@
16262406
16272407 ## Cryptographic blocks fatal and non-fatal errors
16282408
1629-The following table describes cryptographic errors.
2409+The following table describes cryptographic errors.
16302410
16312411 | Errors | Error type | Description |
16322412 | :----------- | :----------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -1654,6 +2434,7 @@
16542434 | DRBG | Deterministic Random Bit Generator |
16552435 | DWORD | 32-bit (4-byte) data element |
16562436 | ECDSA | Elliptic Curve Digital Signature Algorithm |
2437+| ECDH | Elliptic Curve Deffie-Hellman Key Exchange |
16572438 | FMC | FW First Mutable Code |
16582439 | FSM | Finite State Machine |
16592440 | GPU | Graphics Processing Unit |
@@ -1693,20 +2474,21 @@
16932474
16942475 # References
16952476
1696-1. J. Strömbergson, "Secworks," \[Online\]. Available at https://github.com/secworks.
1697-2. NIST, Federal Information Processing Standards Publication (FIPS PUB) 180-4 Secure Hash Standard (SHS).
1698-3. OpenSSL \[Online\]. Available at https://www.openssl.org/docs/man3.0/man3/SHA512.html.
1699-4. N. W. Group, RFC 3394, Advanced Encryption Standard (AES) Key Wrap Algorithm, 2002.
1700-5. NIST, Federal Information Processing Standards Publication (FIPS) 198-1, The Keyed-Hash Message Authentication Code, 2008.
1701-6. N. W. Group, RFC 4868, Using HMAC-SHA256, HMAC-SHA384, and HMAC-SHA512 with IPsec, 2007.
1702-7. RFC 6979, Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA), 2013.
1703-8. TCG, Hardware Requirements for a Device Identifier Composition Engine, 2018.
1704-9. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302.
1705-10. Schindler, W., Wiemers, A.: Efficient side-channel attacks on scalar blinding on elliptic curves with special structure. In: NISTWorkshop on ECC Standards (2015).
1706-11. National Institute of Standards and Technology, "Digital Signature Standard (DSS)", Federal Information Processing Standards Publication (FIPS PUB) 186-4, July 2013.
2477+1. J. Strömbergson, "Secworks," \[Online\]. Available at https://github.com/secworks.
2478+2. NIST, Federal Information Processing Standards Publication (FIPS PUB) 180-4 Secure Hash Standard (SHS).
2479+3. OpenSSL \[Online\]. Available at https://www.openssl.org/docs/man3.0/man3/SHA512.html.
2480+4. N. W. Group, RFC 3394, Advanced Encryption Standard (AES) Key Wrap Algorithm, 2002.
2481+5. NIST, Federal Information Processing Standards Publication (FIPS) 198-1, The Keyed-Hash Message Authentication Code, 2008.
2482+6. N. W. Group, RFC 4868, Using HMAC-SHA256, HMAC-SHA384, and HMAC-SHA512 with IPsec, 2007.
2483+7. RFC 6979, Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA), 2013.
2484+8. TCG, Hardware Requirements for a Device Identifier Composition Engine, 2018.
2485+9. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302.
2486+10. Schindler, W., Wiemers, A.: Efficient side-channel attacks on scalar blinding on elliptic curves with special structure. In: NISTWorkshop on ECC Standards (2015).
2487+11. National Institute of Standards and Technology, "Digital Signature Standard (DSS)", Federal Information Processing Standards Publication (FIPS PUB) 186-4, July 2013.
17072488 12. NIST SP 800-90A, Rev 1: "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", 2012. |
1708-13. CHIPS Alliance, “RISC-V VeeR EL2 Programmer’s Reference Manual” \[Online\] Available at https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf.
1709-14. “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanovi ́c, RISC-V Foundation, December 2019. Available at https://riscv.org/technical/specifications/.
1710-15. “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanovi ́c, and John Hauser, RISC-V International, December 2021. Available at https://riscv.org/technical/specifications/.
2489+13. CHIPS Alliance, “RISC-V VeeR EL2 Programmer’s Reference Manual” \[Online\] Available at https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf.
2490+14. “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanovi ́c, RISC-V Foundation, December 2019. Available at https://riscv.org/technical/specifications/.
2491+15. “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanovi ́c, and John Hauser, RISC-V International, December 2021. Available at https://riscv.org/technical/specifications/.
2492+16. NIST SP 800-56A, Rev 3: "Recommendation for Pair-Wise Key-Establishment Schemes Using Discrete Logarithm Cryptography", 2018, |
17112493
17122494 <sup>[1]</sup> _Caliptra.** **Spanish for “root cap” and describes the deepest part of the root_

Image Changes

v1.2: CONFIGOPTS.png

Old version

v2.0: CONFIGOPTS.png

Image not present in this version

v1.2: Caliptra_eq_CLKDIV.png

Old version

v2.0: Caliptra_eq_CLKDIV.png

Image not present in this version

v1.2: Caliptra_eq_NCO.png

Old version

v2.0: Caliptra_eq_NCO.png

Image not present in this version

v1.2: Caliptra_eq_SPI_clk_period.png

Old version

v2.0: Caliptra_eq_SPI_clk_period.png

Image not present in this version

v1.2: Caliptra_eq_UART.png

Old version

v2.0: Caliptra_eq_UART.png

Image not present in this version

v1.2: Caliptra_eq_UART2.png

Old version

v2.0: Caliptra_eq_UART2.png

Image not present in this version

v1.2: Crypto-2p0.png

Image not present in this version

v2.0: Crypto-2p0.png

New version

v1.2: ECC_arch.png

Image not present in this version

v2.0: ECC_arch.png

New version

v1.2: ECDSA_arch.png

Old version

v2.0: ECDSA_arch.png

Image not present in this version

v1.2: GHASH_TVLA_Figure10ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure10ab.png

New version

v1.2: GHASH_TVLA_Figure11ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure11ab.png

New version

v1.2: GHASH_TVLA_Figure12ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure12ab.png

New version

v1.2: GHASH_TVLA_Figure13ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure13ab.png

New version

v1.2: GHASH_TVLA_Figure14ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure14ab.png

New version

v1.2: GHASH_TVLA_Figure15ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure15ab.png

New version

v1.2: GHASH_TVLA_Figure2.png

Image not present in this version

v2.0: GHASH_TVLA_Figure2.png

New version

v1.2: GHASH_TVLA_Figure3.png

Image not present in this version

v2.0: GHASH_TVLA_Figure3.png

New version

v1.2: GHASH_TVLA_Figure4.png

Image not present in this version

v2.0: GHASH_TVLA_Figure4.png

New version

v1.2: GHASH_TVLA_Figure5.png

Image not present in this version

v2.0: GHASH_TVLA_Figure5.png

New version

v1.2: GHASH_TVLA_Figure6ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure6ab.png

New version

v1.2: GHASH_TVLA_Figure7ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure7ab.png

New version

v1.2: GHASH_TVLA_Figure8ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure8ab.png

New version

v1.2: GHASH_TVLA_Figure9ab.png

Image not present in this version

v2.0: GHASH_TVLA_Figure9ab.png

New version

v1.2: HMAC_SHA_512_256.png

Image not present in this version

v2.0: HMAC_SHA_512_256.png

New version

v1.2: HMAC_pseudo.png

Old version

v2.0: HMAC_pseudo.png

New version

v1.2: QSPI_flash.png

Old version

v2.0: QSPI_flash.png

Image not present in this version

v1.2: QSPI_segments.png

Old version

v2.0: QSPI_segments.png

Image not present in this version

v1.2: SPI_read.png

Old version

v2.0: SPI_read.png

Image not present in this version

v1.2: UART_block.png

Old version

v2.0: UART_block.png

Image not present in this version

v1.2: crypto_subsystem.png

Old version

v2.0: crypto_subsystem.png

Image not present in this version

v1.2: serial_transmission.png

Old version

v2.0: serial_transmission.png

Image not present in this version

v1.2: sharedkey_pseudo.png

Image not present in this version

v2.0: sharedkey_pseudo.png

New version