| @@ -1,12 +1,12 @@ | |||
| 1 | 1 | <div style="font-size: 0.85em; color: #656d76; margin-bottom: 1em; padding: 0.5em; background: #f6f8fa; border-radius: 4px;"> | |
| 2 | -📄 Source: <a href="https://github.com/chipsalliance/caliptra-rtl/blob/5f85fb4bc95b753a2f7d042db7dc2644ca1e8c49/docs/CaliptraHardwareSpecification.md" target="_blank">chipsalliance/caliptra-rtl/docs/CaliptraHardwareSpecification.md</a> @ <code>5f85fb4</code> | ||
| 2 | +📄 Source: <a href="https://github.com/chipsalliance/caliptra-rtl/blob/35b0bc5691b2bd0fc180403914cfabe207379089/docs/CaliptraHardwareSpecification.md" target="_blank">chipsalliance/caliptra-rtl/docs/CaliptraHardwareSpecification.md</a> @ <code>35b0bc5</code> | ||
| 3 | 3 | </div> | |
| 4 | 4 | ||
| 5 | 5 |  | |
| 6 | 6 | ||
| 7 | 7 | <p style="text-align: center;">Caliptra Hardware Specification</p> | |
| 8 | 8 | ||
| 9 | -<p style="text-align: center;">Version 1.1</p> | ||
| 9 | +<p style="text-align: center;">Revision 2.0.3</p> | ||
| 10 | 10 | ||
| 11 | 11 | <div style="page-break-after: always"></div> | |
| 12 | 12 | ||
| @@ -21,6 +21,23 @@ | |||
| 21 | 21 | # Caliptra Core | |
| 22 | 22 | ||
| 23 | 23 | For information on the Caliptra Core, see the [High level architecture](https://chipsalliance.github.io/Caliptra/doc/Caliptra.html#high-level-architecture) section of [Caliptra: A Datacenter System on a Chip (SoC) Root of Trust (RoT)](https://chipsalliance.github.io/Caliptra/doc/Caliptra.html). | |
| 24 | + | ||
| 25 | +## Key Caliptra Core 2.0 Changes | ||
| 26 | +* AXI subordinate replaces APB interface of Caliptra 1.x hardware | ||
| 27 | +* SHA Accelerator functionality now available exclusively to Caliptra | ||
| 28 | + * Caliptra uC may use internally in mailbox mode or via the Caliptra AXI DMA assist engine in streaming mode | ||
| 29 | + * SHA Accelerator adds new SHA save/restore functionality | ||
| 30 | +* Adams Bridge Dilithium/ML-DSA (refer to [Adams bridge spec](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md)) | ||
| 31 | +* Subsystem mode support (refer to [Subsystem Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/Caliptra%202.0%20Subsystem%20Specification%201.pdf) for details) | ||
| 32 | + * ECDH hardware support | ||
| 33 | + * HMAC512 hardware support | ||
| 34 | + * AXI Manager with DMA support (refer to [DMA Specification](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSHardwareSpecification.md#caliptra-axi-manager--dma-assist)) | ||
| 35 | + * Manufacturing and Debug Unlock | ||
| 36 | + * UDS programming | ||
| 37 | + * Read logic for Secret Fuses | ||
| 38 | + * Streaming Boot Support | ||
| 39 | +* RISC-V core PMP support | ||
| 40 | +* CSR HMAC key for manufacturing flow | ||
| 24 | 41 | ||
| 25 | 42 | ## Boot FSM | |
| 26 | 43 | ||
| @@ -57,12 +74,13 @@ | |||
| 57 | 74 | | Parameter | Configuration | | |
| 58 | 75 | | :---------------------- | :------------ | | |
| 59 | 76 | | Interface | AHB-Lite | | |
| 60 | -| DCCM | 128 KiB | | ||
| 61 | -| ICCM | 128 KiB | | ||
| 77 | +| DCCM | 256 KiB | | ||
| 78 | +| ICCM | 256 KiB | | ||
| 62 | 79 | | I-Cache | Disabled | | |
| 63 | 80 | | Reset Vector | 0x00000000 | | |
| 64 | 81 | | Fast Interrupt Redirect | Enabled | | |
| 65 | 82 | | External Interrupts | 31 | | |
| 83 | +| PMP | Enabled | | ||
| 66 | 84 | ||
| 67 | 85 | ||
| 68 | 86 | ### Embedded memory export | |
| @@ -75,12 +93,12 @@ | |||
| 75 | 93 | ||
| 76 | 94 | | Subsystem | Address size | Start address | End address | | |
| 77 | 95 | | :------------------ | :----------- | :------------ | :---------- | | |
| 78 | -| ROM | 48 KiB | 0x0000_0000 | 0x0000_BFFF | | ||
| 96 | +| ROM | 96 KiB | 0x0000_0000 | 0x0000_BFFF | | ||
| 79 | 97 | | Cryptographic | 512 KiB | 0x1000_0000 | 0x1007_FFFF | | |
| 80 | 98 | | Peripherals | 32 KiB | 0x2000_0000 | 0x2000_7FFF | | |
| 81 | -| SoC IFC | 256 KiB | 0x3000_0000 | 0x3003_FFFF | | ||
| 82 | -| RISC-V Core ICCM | 128 KiB | 0x4000_0000 | 0x4001_FFFF | | ||
| 83 | -| RISC-V Core DCCM | 128 KiB | 0x5000_0000 | 0x5001_FFFF | | ||
| 99 | +| SoC IFC | 512 KiB | 0x3000_0000 | 0x3007_FFFF | | ||
| 100 | +| RISC-V Core ICCM | 256 KiB | 0x4000_0000 | 0x4003_FFFF | | ||
| 101 | +| RISC-V Core DCCM | 256 KiB | 0x5000_0000 | 0x5003_FFFF | | ||
| 84 | 102 | | RISC-V MM CSR (PIC) | 256 MiB | 0x6000_0000 | 0x6FFF_FFFF | | |
| 85 | 103 | ||
| 86 | 104 | ||
| @@ -92,12 +110,14 @@ | |||
| 92 | 110 | | :---------------------------------- | :-------- | :----------- | :------------ | :---------- | | |
| 93 | 111 | | Cryptographic Initialization Engine | 0 | 32 KiB | 0x1000_0000 | 0x1000_7FFF | | |
| 94 | 112 | | ECC Secp384 | 1 | 32 KiB | 0x1000_8000 | 0x1000_FFFF | | |
| 95 | -| HMAC384 | 2 | 4 KiB | 0x1001_0000 | 0x1001_0FFF | | ||
| 113 | +| HMAC512 | 2 | 4 KiB | 0x1001_0000 | 0x1001_0FFF | | ||
| 96 | 114 | | Key Vault | 3 | 8 KiB | 0x1001_8000 | 0x1001_9FFF | | |
| 97 | 115 | | PCR Vault | 4 | 8 KiB | 0x1001_A000 | 0x1001_BFFF | | |
| 98 | 116 | | Data Vault | 5 | 8 KiB | 0x1001_C000 | 0x1001_DFFF | | |
| 99 | 117 | | SHA512 | 6 | 32 KiB | 0x1002_0000 | 0x1002_7FFF | | |
| 100 | -| SHA256 | 13 | 32 KiB | 0x1002_8000 | 0x1002_FFFF | | ||
| 118 | +| SHA256 | 10 | 32 KiB | 0x1002_8000 | 0x1002_FFFF | | ||
| 119 | +| ML-DSA | 14 | 64 KiB | 0x1003_0000 | 0x1003_FFFF | | ||
| 120 | +| AES | 15 | 4 KiB | 0x1001_1000 | 0x1001_1FFF | | ||
| 101 | 121 | ||
| 102 | 122 | ||
| 103 | 123 | #### Peripherals subsystem | |
| @@ -106,10 +126,8 @@ | |||
| 106 | 126 | ||
| 107 | 127 | | IP/Peripheral | Target \# | Address size | Start address | End address | | |
| 108 | 128 | | :------------ | :-------- | :----------- | :------------ | :---------- | | |
| 109 | -| QSPI | 7 | 4 KiB | 0x2000_0000 | 0x2000_0FFF | | ||
| 110 | -| UART | 8 | 4 KiB | 0x2000_1000 | 0x2000_1FFF | | ||
| 111 | -| CSRNG | 15 | 4 KiB | 0x2000_2000 | 0x2000_2FFF | | ||
| 112 | -| ENTROPY SRC | 16 | 4 KiB | 0x2000_3000 | 0x2000_3FFF | | ||
| 129 | +| CSRNG | 12 | 4 KiB | 0x2000_2000 | 0x2000_2FFF | | ||
| 130 | +| ENTROPY SRC | 13 | 4 KiB | 0x2000_3000 | 0x2000_3FFF | | ||
| 113 | 131 | ||
| 114 | 132 | ||
| 115 | 133 | #### SoC interface subsystem | |
| @@ -118,10 +136,11 @@ | |||
| 118 | 136 | ||
| 119 | 137 | | IP/Peripheral | Target \# | Address size | Start address | End address | | |
| 120 | 138 | | :------------------------- | :-------- | :----------- | :------------ | :---------- | | |
| 121 | -| Mailbox SRAM Direct Access | 10 | 128 KiB | 0x3000_0000 | 0x3001_FFFF | | ||
| 122 | -| Mailbox CSR | 10 | 4 KiB | 0x3002_0000 | 0x3002_0FFF | | ||
| 123 | -| SHA512 Accelerator CSR | 10 | 4 KiB | 0x3002_1000 | 0x3002_1FFF | | ||
| 124 | -| Mailbox | 10 | 64 KiB | 0x3003_0000 | 0x3003_FFFF | | ||
| 139 | +| Mailbox CSR | 7 | 4 KiB | 0x3002_0000 | 0x3002_0FFF | | ||
| 140 | +| SHA512 Accelerator | 7 | 4 KiB | 0x3002_1000 | 0x3002_1FFF | | ||
| 141 | +| AXI DMA | 7 | 4 KiB | 0x3002_2000 | 0x3002_2FFF | | ||
| 142 | +| SOC IFC CSR | 7 | 64 KiB | 0x3003_0000 | 0x3003_FFFF | | ||
| 143 | +| Mailbox SRAM Direct Access | 7 | 256 KiB | 0x3004_0000 | 0x3007_FFFF | | ||
| 125 | 144 | ||
| 126 | 145 | ||
| 127 | 146 | #### RISC-V core local memory blocks | |
| @@ -130,8 +149,8 @@ | |||
| 130 | 149 | ||
| 131 | 150 | | IP/Peripheral | Target \# | Address size | Start address | End address | | |
| 132 | 151 | | :-------------- | :-------- | :----------- | :------------ | :---------- | | |
| 133 | -| ICCM0 (via DMA) | 12 | 128 KiB | 0x4000_0000 | 0x4001_FFFF | | ||
| 134 | -| DCCM | 11 | 128 KiB | 0x5000_0000 | 0x5001_FFFF | | ||
| 152 | +| ICCM0 (via DMA) | 9 | 256 KiB | 0x4000_0000 | 0x4003_FFFF | | ||
| 153 | +| DCCM | 8 | 256 KiB | 0x5000_0000 | 0x5003_FFFF | | ||
| 135 | 154 | ||
| 136 | 155 | ||
| 137 | 156 | ### Interrupts | |
| @@ -171,14 +190,16 @@ | |||
| 171 | 190 | | SHA512 (Notifications) | 10 | 7 | | |
| 172 | 191 | | SHA256 (Errors) | 11 | 8 | | |
| 173 | 192 | | SHA256 (Notifications) | 12 | 7 | | |
| 174 | -| QSPI (Errors) | 13 | 4 | | ||
| 175 | -| QSPI (Notifications) | 14 | 3 | | ||
| 176 | -| UART (Errors) | 15 | 4 | | ||
| 177 | -| UART (Notifications) | 16 | 3 | | ||
| 178 | -| RESERVED | 17 | 4 | | ||
| 179 | -| RESERVED | 18 | 3 | | ||
| 193 | +| RESERVED | 13, 15, 17 | 4 | | ||
| 194 | +| RESERVED | 14, 16, 18 | 3 | | ||
| 180 | 195 | | Mailbox (Errors) | 19 | 8 | | |
| 181 | 196 | | Mailbox (Notifications) | 20 | 7 | | |
| 197 | +| SHA512 Accelerator (Errors) | 23 | 8 | | ||
| 198 | +| SHA512 Accelerator (Notifications) | 24 | 7 | | ||
| 199 | +| MLDSA (Errors) | 23 | 8 | | ||
| 200 | +| MLDSA (Notifications) | 24 | 7 | | ||
| 201 | +| AXI DMA (Errors) | 25 | 8 | | ||
| 202 | +| AXI DMA (Notifications) | 26 | 7 | | ||
| 182 | 203 | ||
| 183 | 204 | ||
| 184 | 205 | ## Watchdog timer | |
| @@ -230,182 +251,18 @@ | |||
| 230 | 251 | ||
| 231 | 252 | As a result of this implementation, 64-bit data transfers are not supported on the Caliptra AHB fabric. Firmware running on the internal microprocessor may only access memory and registers using a 32-bit or smaller request size, as 64-bit transfer requests will be corrupted. | |
| 232 | 253 | ||
| 254 | +All AHB requests internal to Caliptra must be to an address that is aligned to the native data width of 4-bytes. Any AHB read or write by the Caliptra RISC-V processor that is not aligned to this boundary will fail to decode to the targeted register, will fail to write the submitted data, and will return read data of all zeroes. All AHB requests must also use the native size of 4 bytes (encoded in the hsize signal with a value of 2). The only exception to this is when the RISC-V processor performs byte-aligned, single-byte reads to the Mailbox SRAM using the direct-access mechanism described in [SoC Mailbox](#SoC-mailbox). In this case, a byte-aligned address must be accompanied by the correct size indicator for a single-byte access. Read addresses for byte accesses are aligned to the 4-byte boundary in hardware, and will successfully complete with the correct data at the specified byte offset. Direct mode SRAM writes must be 4-bytes in size and must be aligned to the 4-byte boundary. Hardware writes the entire dword of data to the aligned address, so attempts to write a partial word of data may result in data corruption. | ||
| 255 | + | ||
| 233 | 256 | ## Cryptographic subsystem | |
| 234 | 257 | ||
| 235 | 258 | For details, see the [Cryptographic subsystem architecture](#cryptographic-subsystem-architecture) section. | |
| 236 | 259 | ||
| 237 | -## Peripherals subsystem | ||
| 238 | - | ||
| 239 | -Caliptra includes QSPI and UART peripherals that are used to facilitate alternative operating modes and debug. In the first generation, Caliptra does not support enabling the QSPI interface. Similarly, the UART interface exists to facilitate firmware debug in an FPGA prototype, but should be disabled in final silicon. SystemVerilog defines used to disable these peripherals are described in the [Caliptra Integration Specification](https://github.com/chipsalliance/caliptra-rtl/blob/main/docs/CaliptraIntegrationSpecification.md). Operation of these peripherals is described in the following sections. | ||
| 240 | - | ||
| 241 | -### QSPI Flash Controller | ||
| 242 | - | ||
| 243 | -Caliptra implements a QSPI block that can communicate with 2 QSPI devices. This QSPI block is accessible to FW over the AHB-lite Interface. | ||
| 244 | - | ||
| 245 | -The QSPI block is composed of the spi\_host implementation. For information, see the [SPI\_HOST HWIP Technical Specification](https://opentitan.org/book/hw/ip/spi_host/index.html). The core code (see [spi\_host](https://github.com/lowRISC/opentitan/tree/master/hw/ip/spi_host)) is reused but the interface to the module is changed to AHB-lite and the number of chip select lines supported is increased to 2. The design provides support for Standard SPI, Dual SPI, or Quad SPI commands. The following figure shows the QSPI flash controller. | ||
| 246 | - | ||
| 247 | -*Figure 4: QSPI flash controller* | ||
| 248 | - | ||
| 249 | - | ||
| 250 | - | ||
| 251 | -#### Operation | ||
| 252 | - | ||
| 253 | -Transactions flow through the QSPI block starting with AHB-lite writes to the TXDATA FIFO. Commands are then written and processed by the control FSM, orchestrating transmissions from the TXDATA FIFO and receiving data into the RXDATA FIFO. | ||
| 254 | - | ||
| 255 | -The structure of a command depends on the device and the command itself. In the case of a standard SPI device, the host IP always transmits data on qspi\_d\_io[0] and always receives data from the target device on qspi\_d\_io[1]. In Dual or Quad modes, all data lines are bi-directional, thus allowing full bandwidth in transferring data across 4 data lines. | ||
| 256 | - | ||
| 257 | -A typical SPI command consists of different segments that are combined as shown in the following example. Each segment can configure the length, speed, and direction. As an example, the following SPI read transaction consists of 2 segments. | ||
| 258 | - | ||
| 259 | -*Figure 5: SPI read transaction segments* | ||
| 260 | - | ||
| 261 | - | ||
| 262 | - | ||
| 263 | -| Segment \# | Length (Bytes) | Speed | Direction | TXDATA FIFO | RXDATA FIFO | | ||
| 264 | -| :--------- | :------------- | :------- | :---------------- | :----------- | :----------------- | | ||
| 265 | -| 1 | 4 | standard | TX <br>qspi_d_io\[0\] | \[0\] 0x3 (ReadData) <br>\[1\] Addr\[23:16\] <br>\[2\] Addr\[15:8\] <br>\[3\] Addr\[7:0\] || | ||
| 266 | -| 2 | 1 | standard | RX <br>qspi_d_io\[1\] || \[0\] Data \[7:0\] | | ||
| 267 | - | ||
| 268 | - | ||
| 269 | -In this example, the ReadData (0x3) command was written to the TXDATA FIFO, followed by the 3B address. This maps to a total of 4 bytes that are transmitted out across qspi\_d\_io[0] in the first segment. The second segment consists of a read command that receives 1 byte of data from the target device across qspi\_d\_io[1]. | ||
| 270 | - | ||
| 271 | -QSPI consists of up to four command segments in which the host: | ||
| 272 | - | ||
| 273 | -1. Transmits instructions or data at the standard rate | ||
| 274 | -2. Transmits instructions address or data on 2 or 4 data lines | ||
| 275 | -3. Holds the bus in a high-impedance state for some number of dummy cycles where neither side transmits | ||
| 276 | -4. Receives information from the target device at the specified rate (derived from the original command) | ||
| 277 | - | ||
| 278 | -The following example shows the QSPI segments. | ||
| 279 | - | ||
| 280 | -*Figure 6: QSPI segments* | ||
| 281 | - | ||
| 282 | - | ||
| 283 | - | ||
| 284 | -| Segment \# | Length (Bytes) | Speed | Direction | TXDATA FIFO | RXDATA FIFO | | ||
| 285 | -| :--------- | :------------- | :------- | :------------------ | :----------- | :---------------- | | ||
| 286 | -| 1 | 1 | standard | TX <br>qspi_d_io\[3:0\] | \[0\] 0x6B (ReadDataQuad) || | ||
| 287 | -| 2 | 3\* | quad | TX <br>qspi_d_io\[3:0\] | \[1\] Addr\[23:16\] <br>\[2\] Addr\[15:8\] <br>\[3\] Addr\[7:0\] || | ||
| 288 | -| 3 | 2 | N/A | None (Dummy) ||| | ||
| 289 | -| 4 | 1 | quad | RX <br>qspi_d_io\[3:0\] || \[0\] Data\[7:0\] | | ||
| 290 | - | ||
| 291 | - | ||
| 292 | -Note: In the preceding figure, segment 2 doesn’t show bytes 2 and 3 for brevity. | ||
| 293 | - | ||
| 294 | -#### Configuration | ||
| 295 | - | ||
| 296 | -The CONFIGOPTS multi-register has one entry per CSB line and holds clock configuration and timing settings that are specific to each peripheral. After the CONFIGOPTS multi-register is programmed for each SPI peripheral device, the values can be left unchanged. | ||
| 297 | - | ||
| 298 | -The most common differences between target devices are the requirements for a specific SPI clock phase or polarity, CPOL and CPHA. These clock parameters can be set via the CONFIGOPTS.CPOL or CONFIGOPTS.CPHA register fields. | ||
| 299 | - | ||
| 300 | -The SPI clock rate depends on the peripheral clock and a 16b clock divider configured by CONFIGOPTS.CLKDIV. The following equation is used to configure the SPI clock period: | ||
| 301 | - | ||
| 302 | - | ||
| 303 | - | ||
| 304 | -By default, CLKDIV is set to 0, which means that the maximum frequency that can be achieved is at most half the frequency of the peripheral clock (Fsck = Fclk/2). | ||
| 305 | - | ||
| 306 | -We can rearrange the equation to solve for the CLKDIV: | ||
| 307 | - | ||
| 308 | - | ||
| 309 | - | ||
| 310 | -Assuming a 400MHz target peripheral, and a SPI clock target of 100MHz: | ||
| 311 | - | ||
| 312 | -CONFIGOPTS.CLKDIV = (400/(2\*100)) -1 = 1 | ||
| 313 | - | ||
| 314 | -The following figure shows CONFIGOPTS. | ||
| 315 | - | ||
| 316 | -*Figure 7: CONFIGOPTS* | ||
| 317 | - | ||
| 318 | - | ||
| 319 | - | ||
| 320 | -#### Signal descriptions | ||
| 321 | - | ||
| 322 | -The QSPI block architecture inputs and outputs are described in the following table. | ||
| 323 | - | ||
| 324 | -| Name | Input or output | Description | | ||
| 325 | -| :------------------ | :-------------- | :-------------------------------------------------------- | | ||
| 326 | -| clk_i | input | All signal timings are related to the rising edge of clk. | | ||
| 327 | -| rst_ni | input | The reset signal is active LOW and resets the core. | | ||
| 328 | -| cio_sck_o | output | SPI clock | | ||
| 329 | -| cio_sck_en_o | output | SPI clock enable | | ||
| 330 | -| cio_csb_o\[1:0\] | output | Chip select \# (one hot, active low) | | ||
| 331 | -| cio_csb_en_o\[1:0\] | output | Chip select \# enable (one hot, active low) | | ||
| 332 | -| cio_csb_sd_o\[3:0\] | output | SPI data output | | ||
| 333 | -| cio_csb_sd_en_o | output | SPI data output enable | | ||
| 334 | -| cio_csb_sd_i\[3:0\] | input | SPI data input | | ||
| 335 | - | ||
| 336 | - | ||
| 337 | -#### SPI\_HOST IP programming guide | ||
| 338 | - | ||
| 339 | -The operation of the SPI\_HOST IP proceeds in seven general steps. | ||
| 340 | - | ||
| 341 | -To initialize the IP: | ||
| 342 | - | ||
| 343 | -1. Program the CONFIGOPTS multi-register with the appropriate timing and polarity settings for each csb line. | ||
| 344 | -2. Set the desired interrupt parameters. | ||
| 345 | -3. Enable the IP. | ||
| 346 | - | ||
| 347 | -Then for each command: | ||
| 348 | - | ||
| 349 | -4. Load the data to be transmitted into the FIFO using the TXDATA memory window. | ||
| 350 | -5. Specify the target device by programming the CSID. | ||
| 351 | -6. Specify the structure of the command by writing each segment into the COMMAND register. | ||
| 352 | - | ||
| 353 | - For multi-segment transactions, assert COMMAND.CSAAT for all but the last command segment. | ||
| 354 | - | ||
| 355 | -7. For transactions that expect to receive a reply, the data can then be read back from the RXDATA window. | ||
| 356 | - | ||
| 357 | -Steps 4-7 are then repeated for each subsequent command. | ||
| 358 | - | ||
| 359 | -### UART | ||
| 360 | - | ||
| 361 | -Caliptra implements a UART block that can communicate with a serial device that is accessible to FW over the AHB-lite Interface. This is a configuration that the SoC opts-in by defining CALIPTRA\_INTERNAL\_UART. | ||
| 362 | - | ||
| 363 | -The UART block is composed of the uart implementation. For information, see the [UART HWIP Technical Specification](https://opentitan.org/book/hw/ip/uart/). The design provides support for a programmable baud rate. The UART block is shown in the following figure. | ||
| 364 | - | ||
| 365 | -*Figure 8: UART block* | ||
| 366 | - | ||
| 367 | - | ||
| 368 | - | ||
| 369 | -#### Operation | ||
| 370 | - | ||
| 371 | -Transactions flow through the UART block starting with an AHB-lite write to WDATA, which triggers the transmit module to start a UART TX serial data transfer. The TX module dequeues the byte from the internal FIFO and shifts it out bit by bit at the baud rate. If TX is not enabled, the output is set high and WDATA in the FIFO is queued up. | ||
| 372 | - | ||
| 373 | -The following figure shows the transmit data on the serial lane, starting with the START bit, which is indicated by a high to low transition, followed by the 8 bits of data. | ||
| 374 | - | ||
| 375 | -*Figure 9: Serial transmission frame* | ||
| 376 | - | ||
| 377 | - | ||
| 378 | - | ||
| 379 | -On the receive side, after the START bit is detected, the data is sampled at the center of each data bit and stored into a FIFO. A user can monitor the FIFO status and read the data out of RDATA. | ||
| 380 | - | ||
| 381 | -#### Configuration | ||
| 382 | - | ||
| 383 | -The baud rate can be configured using the CTRL.NCO register field. This should be set using the following equation: | ||
| 384 | - | ||
| 385 | - | ||
| 386 | - | ||
| 387 | -If the desired baud rate is 115,200bps: | ||
| 388 | - | ||
| 389 | - | ||
| 390 | - | ||
| 391 | - | ||
| 392 | - | ||
| 393 | -#### Signal descriptions | ||
| 394 | - | ||
| 395 | -The UART block architecture inputs and outputs are described in the following table. | ||
| 396 | - | ||
| 397 | -| Name | Input or output | Description | | ||
| 398 | -| :------- | :-------------- | :-------------------------------------------------------- | | ||
| 399 | -| clk_i | input | All signal timings are related to the rising edge of clk. | | ||
| 400 | -| rst_ni | input | The reset signal is active LOW and resets the core. | | ||
| 401 | -| cio_rx_i | input | Serial receive bit | | ||
| 402 | -| cio_tx_o | output | Serial transmit bit | | ||
| 403 | - | ||
| 404 | - | ||
| 405 | 260 | ## SoC mailbox | |
| 406 | 261 | ||
| 407 | 262 | For more information on the mailbox protocol, see [Mailbox](https://github.com/chipsalliance/caliptra-rtl/blob/main/docs/CaliptraIntegrationSpecification.md#mailbox) in the Caliptra Integration Specification. Mailbox registers accessible to the Caliptra microcontroller are defined in [internal-regs/mbox_csr](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_csr). | |
| 408 | 263 | ||
| 264 | +The RISC-V processor is able to access the SoC mailbox SRAM using a direct access mode (which bypasses the defined mailbox protocol). The addresses for performing this access are described in [SoC interface subsystem](#SoC-interface-subsystem) and in [mbox_sram](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_sram). In this mode, firmware must first acquire the mailbox lock. Then, reads and writes to the direct access address region will go directly to the SRAM block. Firmware must release the mailbox lock by writing to the [mbox_unlock](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mbox_csr.mbox_unlock) register after direct access operations are completed. | ||
| 265 | + | ||
| 409 | 266 | ||
| 410 | 267 | ## Security state | |
| 411 | 268 | ||
| @@ -417,7 +274,7 @@ | |||
| 417 | 274 | ||
| 418 | 275 | * Caliptra JTAG is opened for the microcontroller and HW debug. | |
| 419 | 276 | ||
| 420 | -* Device secrets (UDS, FE, key vault, and obfuscation key) are programmed to debug values. | ||
| 277 | +* Device secrets (UDS, FE, key vault, csr hmac key and obfuscation key) are programmed to debug values. | ||
| 421 | 278 | ||
| 422 | 279 | If a transition to debug mode happens during ROM operation, any values computed from the use of device secrets may not match expected values. | |
| 423 | 280 | ||
| @@ -428,11 +285,14 @@ | |||
| 428 | 285 | | Name | Default value | | |
| 429 | 286 | | :-------------------------- | :------------ | | |
| 430 | 287 | | Obfuscation Key Debug Value | All 0x1 | | |
| 288 | +| CSR HMAC Key Debug Value | All 0x1 | | ||
| 431 | 289 | | UDS Debug Value | All 0x1 | | |
| 432 | 290 | | Field Entropy Debug Value | All 0x1 | | |
| 433 | 291 | | Key Vault Debug Value 0 | All 0xA | | |
| 434 | 292 | | Key Vault Debug Value 1 | All 0x5 | | |
| 435 | 293 | ||
| 294 | + | ||
| 295 | +Note: When entering debug or scan mode, all crypto engines are zeroized. Before starting any crypto operation in these modes, the status registers of all crypto engines must be checked to confirm they are ready. Failing to do so may trigger a fatal error caused by concurrent crypto operations. | ||
| 436 | 296 | ||
| 437 | 297 | ## Clock gating | |
| 438 | 298 | ||
| @@ -472,17 +332,17 @@ | |||
| 472 | 332 | ||
| 473 | 333 | * JTAG accesses | |
| 474 | 334 | ||
| 475 | -* APB transactions | ||
| 476 | - | ||
| 477 | -Activity on the APB interface only wakes up the SoC IFC clock. All other clocks remain off until any other condition is met or the core exits the halt state. | ||
| 478 | - | ||
| 479 | -| Cpu_halt_status | PSEL | Generic input wires <br>|| fatal error <br>|| debug/scan mode <br> ||JTAG access | Expected behavior | | ||
| 335 | +* AXI transactions | ||
| 336 | + | ||
| 337 | +Activity on the AXI subordinate interface only wakes up the SoC IFC clock. All other clocks remain off until any other condition is met or the core exits the halt state. | ||
| 338 | + | ||
| 339 | +| Cpu_halt_status | s_axi_active | Generic input wires <br>|| fatal error <br>|| debug/scan mode <br> ||JTAG access | Expected behavior | | ||
| 480 | 340 | | :-------------- | :--- | :---------- | :-------------- | | |
| 481 | 341 | | 0 | X | X | All gated clocks active | | |
| 482 | 342 | | 1 | 0 | 0 | All gated clocks inactive | | |
| 483 | 343 | | 1 | 0 | 1 | All gated clocks active (as long as condition is true) | | |
| 484 | -| 1 | 1 | 0 | Soc_ifc_clk_cg active (as long as PSEL = 1) <br>All other clks inactive | | ||
| 485 | -| 1 | 1 | 1 | Soc_ifc_clk_cg active (as long as condition is true OR PSEL = 1) <br>All other clks active (as long as condition is true) | | ||
| 344 | +| 1 | 1 | 0 | Soc_ifc_clk_cg active (as long as s_axi_active = 1) <br>All other clks inactive | | ||
| 345 | +| 1 | 1 | 1 | Soc_ifc_clk_cg active (as long as condition is true OR s_axi_active = 1) <br>All other clks active (as long as condition is true) | | ||
| 486 | 346 | ||
| 487 | 347 | ||
| 488 | 348 | ### Usage | |
| @@ -490,7 +350,7 @@ | |||
| 490 | 350 | The following applies to the clock gating feature: | |
| 491 | 351 | ||
| 492 | 352 | * The core should only be halted after all pending vault writes are done and cryptographic operations are complete. | |
| 493 | -* While the core is halted, any APB transaction wakes up the SoC interface clock and leaves all other clocks disabled. If the core is still halted when the APB transactions are done, the SoC interface clock is returned to a disabled state. . | ||
| 353 | +* While the core is halted, any AXI transaction wakes up the SoC interface clock and leaves all other clocks disabled. If the core is still halted when the AXI transactions are done, the SoC interface clock is returned to a disabled state. . | ||
| 494 | 354 | * The RDC clock is similar to an ungated clock and is only disabled when a reset event occurs. This avoids metastability on flops. The RDC clock operates independently of core halt status. | |
| 495 | 355 | ||
| 496 | 356 | ||
| @@ -530,7 +390,7 @@ | |||
| 530 | 390 | ||
| 531 | 391 | ### Operation | |
| 532 | 392 | ||
| 533 | -Requests for entropy bits start with [command requests](https://opentitan.org/book/hw/ip/csrng/doc/theory_of_operation.html#general-command-format) over the AHB-lite interface to the csrng [CMD\_REQ](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.csrng_reg.CMD_REQ) register. | ||
| 393 | +Requests for entropy bits start with [command requests](https://opentitan.org/book/hw/ip/csrng/doc/theory_of_operation.html#general-command-format) over the AHB-lite interface to the csrng [CMD\_REQ](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.csrng_reg.CMD_REQ) register. | ||
| 534 | 394 | ||
| 535 | 395 | The following describes the fields of the command request header: | |
| 536 | 396 | ||
| @@ -542,7 +402,7 @@ | |||
| 542 | 402 | ||
| 543 | 403 | * Generate Length: Only defined for the generate command, this field is the total number of cryptographic entropy blocks requested. Each unit represents 128 bits of entropy returned. A value of 8 would return a total of 1024 bits. The maximum size supported is 4096. | |
| 544 | 404 | ||
| 545 | -First an instantiate command is requested over the SW application interface to initialize an instance in the CSRNG module. Depending on the flag0 and clen fields in the command header, a request to the entropy\_src module over the entropy interface is sent to seed the csrng. This can take a few milliseconds if the seed entropy is not immediately available. | ||
| 405 | +First an instantiate command is requested over the SW application interface to initialize an instance in the CSRNG module. Depending on the flag0 and clen fields in the command header, a request to the entropy\_src module over the entropy interface is sent to seed the csrng. This can take a few milliseconds if the seed entropy is not immediately available. | ||
| 546 | 406 | ||
| 547 | 407 | Example instantiation: | |
| 548 | 408 | ||
| @@ -560,7 +420,7 @@ | |||
| 560 | 420 | | T | 1-12 | Only provided additional data is used as seed. | | |
| 561 | 421 | ||
| 562 | 422 | ||
| 563 | -Next a generate command is used to request generation of cryptographic entropy bits. The glen field defines how many 128 bit words are to be returned to the application interface. After the generated bits are ready, they can be read out via the GENBITS register. This register must be read out glen \* 4 times for each request made. | ||
| 423 | +Next a generate command is used to request generation of cryptographic entropy bits. The glen field defines how many 128 bit words are to be returned to the application interface. After the generated bits are ready, they can be read out via the GENBITS register. This register must be read out glen \* 4 times for each request made. | ||
| 564 | 424 | ||
| 565 | 425 | Example generate command: | |
| 566 | 426 | ||
| @@ -634,6 +494,111 @@ | |||
| 634 | 494 | ||
| 635 | 495 | The CSRNG may only be enabled if entropy\_src is enabled. After it is disabled, CSRNG may only be re-enabled after entropy\_src has been disabled and re-enabled. | |
| 636 | 496 | ||
| 497 | +### FIPS considerations | ||
| 498 | + | ||
| 499 | +The following sections illustrate the self-test parameter configuration. The | ||
| 500 | +`entropy_src` block provides additional tests, but Caliptra focuses primarily | ||
| 501 | +on the adaptive and repetition count tests, which are the ones strictly | ||
| 502 | +required for FIPS compliance. Additional details can be found in NIST | ||
| 503 | +publication SP 800-90B. | ||
| 504 | + | ||
| 505 | +The TRNG must be re-initialized whenever self-test parameter changes are | ||
| 506 | +needed. As described in the previous section, the initialization steps | ||
| 507 | +are as follows: | ||
| 508 | + | ||
| 509 | +1. Disable `csrng` and `entropy_src` in that order. | ||
| 510 | +2. Apply new self-test configuration. | ||
| 511 | +3. Enable `entropy_src` and `csrng` in that order. | ||
| 512 | + | ||
| 513 | +### Adaptive self-test window and thresholds | ||
| 514 | + | ||
| 515 | +This section details the configuration of the `entropy_src`, focusing on how | ||
| 516 | +the test window size for the adaptive self-test is determined and how it | ||
| 517 | +relates to threshold calculations. | ||
| 518 | + | ||
| 519 | +#### Understanding Test Window Sizes | ||
| 520 | + | ||
| 521 | +The adaptive self-test within the `entropy_src` block utilizes a | ||
| 522 | +configurable test window. To clarify its interpretation, two terms are | ||
| 523 | +defined: | ||
| 524 | + | ||
| 525 | +* `ENTROPY_TEST_WINDOW`: This refers to the test window size directly | ||
| 526 | + configured in the hardware registers of the `entropy_src` block. | ||
| 527 | +* `ACTUAL_TEST_WINDOW`: This refers to the effective window size used for | ||
| 528 | + the adaptive self-test threshold calculations. Its value depends on how | ||
| 529 | + the test scores are aggregated. | ||
| 530 | + | ||
| 531 | +The aggregation method is determined by the CONF.THRESHOLD_SCOPE setting in | ||
| 532 | +the entropy_src block. | ||
| 533 | + | ||
| 534 | +#### Aggregate per symbol | ||
| 535 | + | ||
| 536 | +When CONF.THRESHOLD_SCOPE is enabled: | ||
| 537 | + | ||
| 538 | +* The adaptive test combines the inputs from all physical entropy lines | ||
| 539 | + into a single, cumulative score. | ||
| 540 | +* The test essentially treats the combined input as a single binary stream, | ||
| 541 | + counting the occurrences of '1's. | ||
| 542 | +* In this configuration: | ||
| 543 | + * If `ENTROPY_TEST_WINDOW` is set to 1024, then | ||
| 544 | + * `ACTUAL_TEST_WINDOW` = `ENTROPY_TEST_WINDOW` = 1024 | ||
| 545 | + | ||
| 546 | +#### Handle each physical noise source separately | ||
| 547 | + | ||
| 548 | +When `CONF.THRESHOLD_SCOPE` is disabled: | ||
| 549 | + | ||
| 550 | +* The adaptive test scores each individual physical noise input line | ||
| 551 | + independently. | ||
| 552 | +* This allows for monitoring the health of each noise source. | ||
| 553 | +* In this configuration (assuming, for example, 4 noise sources): | ||
| 554 | + * If `ENTROPY_TEST_WINDOW` is set to 4096 bits, then | ||
| 555 | + * `ACTUAL_TEST_WINDOW` = (`ENTROPY_TEST_WINDOW` / 4) = 1024 | ||
| 556 | + | ||
| 557 | +#### Configuring adaptive self-test thresholds | ||
| 558 | + | ||
| 559 | +Once the `ACTUAL_TEST_WINDOW` is determined, the adaptive self-test | ||
| 560 | +thresholds can be configured as follows: | ||
| 561 | + | ||
| 562 | +* `ADAPTP_HI_THRESHOLDS.FIPS_THRESH` = `adaptp_cutoff` | ||
| 563 | +* `ADAPTP_LO_THRESHOLDS.FIPS_THRESH` = `ACTUAL_TEST_WINDOW` - `adaptp_cutoff` | ||
| 564 | + | ||
| 565 | +Here, `adaptp_cutoff` represents the pre-determined cutoff value for the | ||
| 566 | +adaptive proportion test, as defined by NIST SP 800-90B. See the threshold | ||
| 567 | +calculations below as an example. | ||
| 568 | + | ||
| 569 | +\\(α = 2^{-40}\\) (recommended)\ | ||
| 570 | +\\(H = 0.5\\) (example, estimated entropy measured from hardware)\ | ||
| 571 | +\\(W\\) = `ACTUAL_TEST_WINDOW`\ | ||
| 572 | +`adaptp_cutoff` = \\(1 + critbinom(W, 2^{-H}, 1 - α)\\) | ||
| 573 | + | ||
| 574 | +> Note: The `critbinom` function (critical binomial distribution function) is | ||
| 575 | +> implemented by most spreadsheet applications. | ||
| 576 | + | ||
| 577 | +### Recommended configuration | ||
| 578 | + | ||
| 579 | +The following configuration is recommended for the adaptive and repetition | ||
| 580 | +count tests: | ||
| 581 | + | ||
| 582 | +#### Adaptive test | ||
| 583 | + | ||
| 584 | +1. Set `CONF.THRESHOLD_SCOPE` to disabled. This allows the test to monitor | ||
| 585 | + and score each physical noise source individually, providing more granular | ||
| 586 | + health information. | ||
| 587 | +2. Set `HEALTH_TEST_WINDOWS.FIPS_WINDOW` to 4096 bits. This value serves | ||
| 588 | + as the `ENTROPY_TEST_WINDOW`. With the current 4 noise source configuration, | ||
| 589 | + this is equivalent to 1024 bits per noise source, where each source produces | ||
| 590 | + 1 bit of entropy as defined in NIST SP 800-90B. | ||
| 591 | +3. Calculate thresholds. Use an `ACTUAL_TEST_WINDOW` of 1024 bits (derived | ||
| 592 | + from step 2) in the adaptive test threshold formulas provided earlier in | ||
| 593 | + this subsection. | ||
| 594 | + | ||
| 595 | +#### Repetition count test | ||
| 596 | + | ||
| 597 | +The methodology used for calculating the repetition count threshold in the | ||
| 598 | +ROM boot phase can be directly applied for this test as well. The threshold is | ||
| 599 | +applied on a per-noise-source basis. | ||
| 600 | + | ||
| 601 | + | ||
| 637 | 602 | ## External-TRNG REQ HW API | |
| 638 | 603 | ||
| 639 | 604 | For SoCs that choose to not instantiate Caliptra’s integrated TRNG, Caliptra provides a TRNGREQ HW API. | |
| @@ -647,18 +612,16 @@ | |||
| 647 | 612 | ||
| 648 | 613 | ## SoC-SHA accelerator HW API | |
| 649 | 614 | ||
| 650 | -Caliptra provides a SHA accelerator HW API for SoC and Caliptra internal FW to use. It is atomic in nature in that only one of them can use the SHA accelerator HW API at the same time. Details of the SHA accelerator register block may be found in the GitHub repository in [documentation](https://chipsalliance.github.io/caliptra-rtl/main/external-regs/?p=caliptra_top_reg.sha512_acc_csr) generated from the register definition file. | ||
| 615 | +Caliptra provides a SHA accelerator HW API for Caliptra internal FW to use via mailbox or via DMA operations through the AXI subordinate interface. The SHA accelerator HW API is restricted on AXI for use by Caliptra via the AXI DMA assist block; this access restriction is enforced by checking logic on the AXI AxUSER signal associated with the request. | ||
| 651 | 616 | ||
| 652 | 617 | Using the HW API: | |
| 653 | 618 | ||
| 654 | 619 | * A user of the HW API first locks the accelerator by reading the LOCK register. A read that returns the value 0 indicates that the resource was locked for exclusive use by the requesting user. A write of ‘1 clears the lock. | |
| 655 | -* The USER register captures the APB pauser value of the requestor that locked the SHA accelerator. This is the only user that is allowed to control the SHA accelerator by performing APB register writes. Writes by any other agent on the APB interface are dropped. | ||
| 656 | -* MODE register is written to set the SHA execution mode. | ||
| 657 | - * SHA accelerator supports both SHA384 and SHA512 modes of operation. | ||
| 658 | - * SHA supports **streaming** mode: SHA is computed on a stream of incoming data to the DATAIN register. The EXECUTE register, when set, indicates to the accelerator that streaming is complete. The accelerator can then publish the result into the DIGEST register. When the VALID bit of the STATUS register is set, then the result in the DIGEST register is valid. | ||
| 659 | - * SHA supports **Mailbox** mode: SHA is computed on LENGTH (DLEN) bytes of data stored in the mailbox beginning at START\_ADDRESS. This computation is performed when the EXECUTE register is set by the user. When the operation is completed and the result in the DIGEST register is valid, SHA accelerator sets the VALID bit of the STATUS register. | ||
| 660 | - * The SHA computation engine in the SHA accelerator requires big endian data, but the SHA accelerator can accommodate mailbox input data in either the little endian or big endian format. By default, input data is assumed to be little endian and is swizzled to big endian at the byte level prior to computation. For the big endian format, data is loaded into the SHA engine as-is. Users may configure the SHA accelerator to treat data as big endian by setting the ENDIAN\_TOGGLE bit appropriately. | ||
| 661 | - * See the register definition for the encodings. | ||
| 620 | +* The USER register captures the AXI USERID value of the requestor that locked the SHA accelerator. This is the only user that is allowed to control the SHA accelerator by performing AXI register writes. Writes by any other agent on the AXI subordinate interface are dropped. | ||
| 621 | +* SHA supports **Mailbox** mode: SHA is computed on LENGTH (DLEN) bytes of data stored in the mailbox beginning at START\_ADDRESS. This computation is performed when the EXECUTE register is set by the user. When the operation is completed and the result in the DIGEST register is valid, SHA accelerator sets the VALID bit of the STATUS register. | ||
| 622 | +* Note that even though the mailbox size is fixed, due to SHA save/restore function enhancement, there is no limit on the size of the block that needs to be SHAd. SOC needs to follow FW API | ||
| 623 | +* The SHA computation engine in the SHA accelerator requires big endian data, but the SHA accelerator can accommodate mailbox input data in either the little endian or big endian format. By default, input data is assumed to be little endian and is swizzled to big endian at the byte level prior to computation. For the big endian format, data is loaded into the SHA engine as-is. Users may configure the SHA accelerator to treat data as big endian by setting the ENDIAN\_TOGGLE bit appropriately. | ||
| 624 | +* See the register definition for the encodings. | ||
| 662 | 625 | * SHA engine also provides a ‘zeroize’ function through its CONTROL register to clear any of the SHA internal state. This can be used when the user wants to conceal previous state for debug or security reasons. | |
| 663 | 626 | ||
| 664 | 627 | ## JTAG implementation | |
| @@ -683,7 +646,7 @@ | |||
| 683 | 646 | * De-obfuscation engine | |
| 684 | 647 | * SHA512/384 (based on NIST FIPS 180-4 [2]) | |
| 685 | 648 | * SHA256 (based on NIST FIPS 180-4 [2]) | |
| 686 | - * HMAC384 (based on [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5] and [RFC 4868](https://tools.ietf.org/html/rfc4868) [6]) | ||
| 649 | + * HMAC512 (based on [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5] and [RFC 4868](https://tools.ietf.org/html/rfc4868) [6]) | ||
| 687 | 650 | * Public-key cryptography | |
| 688 | 651 | * NIST Secp384r1 Deterministic Digital Signature Algorithm (based on FIPS-186-4 [11] and RFC 6979 [7]) | |
| 689 | 652 | * Key vault | |
| @@ -694,7 +657,7 @@ | |||
| 694 | 657 | ||
| 695 | 658 | *Figure 17: Caliptra cryptographic subsystem* | |
| 696 | 659 | ||
| 697 | - | ||
| 660 | + | ||
| 698 | 661 | ||
| 699 | 662 | ## SHA512/SHA384 | |
| 700 | 663 | ||
| @@ -927,13 +890,13 @@ | |||
| 927 | 890 | | 1 KiB message | 8761 | 21.90 | 45,657 | | |
| 928 | 891 | ||
| 929 | 892 | ||
| 930 | -## HMAC384 | ||
| 931 | - | ||
| 932 | -Hash-based message authentication code (HMAC) is a cryptographic authentication technique that uses a hash function and a secret key. HMAC involves a cryptographic hash function and a secret cryptographic key. This implementation supports HMAC-SHA-384-192 as specified in [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5]. The implementation is compatible with the HMAC-SHA-384-192 authentication and integrity functions defined in [RFC 4868](https://tools.ietf.org/html/rfc4868) [6]. | ||
| 933 | - | ||
| 934 | -Caliptra HMAC implementation uses SHA384 as the hash function, accepts a 384-bit key, and generates a 384-bit tag. | ||
| 935 | - | ||
| 936 | -The implementation also supports PRF-HMAC-SHA-384. The PRF-HMAC-SHA-384 algorithm is identical to HMAC-SHA-384-192, except that variable-length keys are permitted, and the truncation step is not performed. | ||
| 893 | +## HMAC512/HMAC384 | ||
| 894 | + | ||
| 895 | +Hash-based message authentication code (HMAC) is a cryptographic authentication technique that uses a hash function and a secret key. HMAC involves a cryptographic hash function and a secret cryptographic key. This implementation supports the HMAC512 variants HMAC-SHA-512-256 and HMAC-SHA-384-192 as specified in [NIST FIPS 198-1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.198-1.pdf) [5]. The implementation is compatible with the HMAC-SHA-512-256 and HMAC-SHA-384-192 authentication and integrity functions defined in [RFC 4868](https://tools.ietf.org/html/rfc4868) [6]. | ||
| 896 | + | ||
| 897 | +Caliptra HMAC implementation uses SHA512 as the hash function, accepts a 512-bit key, and generates a 512-bit tag. | ||
| 898 | + | ||
| 899 | +The implementation also supports PRF-HMAC-SHA-512. The PRF-HMAC-SHA-512 algorithm is identical to HMAC-SHA-512-256, except that variable-length keys are permitted, and the truncation step is not performed. | ||
| 937 | 900 | ||
| 938 | 901 | The HMAC algorithm is described as follows: | |
| 939 | 902 | * The key is fed to the HMAC core to be padded | |
| @@ -980,9 +943,15 @@ | |||
| 980 | 943 | ||
| 981 | 944 | #### Hashing | |
| 982 | 945 | ||
| 983 | -The HMAC core performs the sha2-384 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure. | ||
| 984 | - | ||
| 985 | -*Figure 28: HMAC-SHA-384-192 data flow* | ||
| 946 | +The HMAC512 core performs the sha2-512 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure. | ||
| 947 | + | ||
| 948 | +*Figure 28: HMAC-SHA-512-256 data flow* | ||
| 949 | + | ||
| 950 | + | ||
| 951 | + | ||
| 952 | +The HMAC384 core performs the sha2-384 function to process the hash value of the given message. The algorithm processes each block of the 1024 bits from the message, using the result from the previous block. This data flow is shown in the following figure. | ||
| 953 | + | ||
| 954 | +*Figure 29: HMAC-SHA-384-192 data flow* | ||
| 986 | 955 | ||
| 987 | 956 |  | |
| 988 | 957 | ||
| @@ -990,26 +959,33 @@ | |||
| 990 | 959 | ||
| 991 | 960 | The HMAC architecture has the finite-state machine as shown in the following figure. | |
| 992 | 961 | ||
| 993 | -*Figure 29: HMAC FSM* | ||
| 962 | +*Figure 30: HMAC FSM* | ||
| 994 | 963 | ||
| 995 | 964 |  | |
| 996 | 965 | ||
| 966 | +### CSR Mode | ||
| 967 | + | ||
| 968 | +When the CSR Mode register is set, the HMAC512 core uses the value latched from the cptra_csr_hmac_key interface pins in place of the API key register. These pins are latched internally after powergood assertion during DEVICE_MANUFACTURING lifecycle state. During debug mode operation this value is overridden with all 1's, and during any other lifecycle state it has a value of zero. | ||
| 969 | + | ||
| 997 | 970 | ### Signal descriptions | |
| 998 | 971 | ||
| 999 | 972 | The HMAC architecture inputs and outputs are described in the following table. | |
| 1000 | 973 | ||
| 1001 | 974 | | Name | Input or output | Description | | |
| 1002 | -| :----------------- | :-------------- | :----------- | | ||
| 975 | +| :-------------------------- | :-------------- | :----------- | | ||
| 1003 | 976 | | clk | input | All signal timings are related to the rising edge of clk. | | |
| 1004 | 977 | | reset_n | input | The reset signal is active LOW and resets the core. This is the only active LOW signal. | | |
| 1005 | 978 | | init | input | The core is initialized and processes the key and the first block of the message. | | |
| 1006 | 979 | | next | input | The core processes the rest of the message blocks using the result from the previous blocks. | | |
| 1007 | 980 | | zeroize | input | The core clears all internal registers to avoid any SCA information leakage. | | |
| 1008 | -| key\[383:0\] | input | The input key. | | ||
| 981 | +| csr_mode | input | When set, the key comes from the cptra_csr_hmac_key interface pins. This key is valid only during MANUFACTURING mode. | | ||
| 982 | +| mode | input | Indicates the hmac type of the function. This can be: <br>- HMAC384 <br>- HMAC512. | | ||
| 983 | +| cptra_csr_hmac_key\[511:0\] | input | The key to be used during csr mode. | | ||
| 984 | +| key\[511:0\] | input | The input key. | | ||
| 1009 | 985 | | block\[1023:0\] | input | The input padded block of message. | | |
| 1010 | -| LFSR_seed\[159:0\] | Input | The input to seed PRNG to enable the masking countermeasure for SCA protection. | | ||
| 986 | +| LFSR_seed\[383:0\] | Input | The input to seed PRNG to enable the masking countermeasure for SCA protection. | | ||
| 1011 | 987 | | ready | output | When HIGH, the signal indicates the core is ready. | | |
| 1012 | -| tag\[383:0\] | output | The HMAC value of the given key or block. For PRF-HMAC-SHA-384, a 384-bit tag is required. For HMAC-SHA-384-192, the host is responsible for reading 192 bits from the MSB. | | ||
| 988 | +| tag\[511:0\] | output | The HMAC value of the given key or block. For PRF-HMAC-SHA-512, a 512-bit tag is required. For HMAC-SHA-512-256, the host is responsible for reading 256 bits from the MSB. | | ||
| 1013 | 989 | | tag_valid | output | When HIGH, the signal indicates the result is ready. | | |
| 1014 | 990 | ||
| 1015 | 991 | ||
| @@ -1021,7 +997,7 @@ | |||
| 1021 | 997 | ||
| 1022 | 998 | The following pseudocode demonstrates how the HMAC interface can be implemented. | |
| 1023 | 999 | ||
| 1024 | -*Figure 30: HMAC pseudocode* | ||
| 1000 | +*Figure 31: HMAC pseudocode* | ||
| 1025 | 1001 | ||
| 1026 | 1002 |  | |
| 1027 | 1003 | ||
| @@ -1033,7 +1009,7 @@ | |||
| 1033 | 1009 | ||
| 1034 | 1010 | The embedded countermeasures are based on "Differential Power Analysis of HMAC Based on SHA-2, and Countermeasures" by McEvoy et. al. To provide the required random values for masking intermediate values, a lightweight 74-bit LFSR is implemented. Based on “Spin Me Right Round Rotational Symmetry for FPGA-specific AES” by Wegener et. al., LFSR is sufficient for masking statistical randomness. | |
| 1035 | 1011 | ||
| 1036 | -Each round of SHA512 execution needs 6,432 random bits, while one HMAC operation needs at least 4 rounds of SHA512 operations. However, the proposed architecture requires only 160-bit LFSR seed and provides first-order DPA attack protection at the cost of 10% latency overhead with negligible hardware resource overhead. | ||
| 1012 | +Each round of SHA512 execution needs 6,432 random bits, while one HMAC operation needs at least 4 rounds of SHA512 operations. However, the proposed architecture requires only 384-bit LFSR seed and provides first-order DPA attack protection at the cost of 10% latency overhead with negligible hardware resource overhead. | ||
| 1037 | 1013 | ||
| 1038 | 1014 | ### Performance | |
| 1039 | 1015 | ||
| @@ -1054,9 +1030,9 @@ | |||
| 1054 | 1030 | | 128 KiB message | 207,979 | 519.947 | 1,923 | | |
| 1055 | 1031 | ||
| 1056 | 1032 | ||
| 1057 | -#### Hardware/software architecture | ||
| 1058 | - | ||
| 1059 | -In this architecture, the HMAC interface and controller are implemented in RISC-V core. The performance specification of the HMAC architecture is reported as shown in the following table. | ||
| 1033 | +#### Hardware/software architecture | ||
| 1034 | + | ||
| 1035 | +In this architecture, the HMAC interface and controller are implemented in RISC-V core. The performance specification of the HMAC architecture is reported as shown in the following table. | ||
| 1060 | 1036 | ||
| 1061 | 1037 | | Operation | Cycle count \[CCs\] | Time \[us\] @ 400 MHz | Throughput \[op/s\] | | |
| 1062 | 1038 | | :-------------------- | :------------------ | :-------------------- | :------------------ | | |
| @@ -1090,7 +1066,7 @@ | |||
| 1090 | 1066 | ||
| 1091 | 1067 | 1. Set V_init = 0x01 0x01 0x01 ... 0x01 (V has 384-bit) | |
| 1092 | 1068 | 2. Set K_init = 0x00 0x00 0x00 ... 0x00 (K has 384-bit) | |
| 1093 | - 3. K_tmp = HMAC(K_init, V_init || 0x00 || entropy || nonce) | ||
| 1069 | + 3. K_tmp = HMAC(K_init, V_init || 0x00 || entropy || nonce) | ||
| 1094 | 1070 | 4. V_tmp = HMAC(K_tmp, V_init) | |
| 1095 | 1071 | 5. K_new = HMAC(K_tmp, V_tmp || 0x01 || entropy || nonce) | |
| 1096 | 1072 | 6. V_new = HMAC(K_new, V_tmp) | |
| @@ -1138,13 +1114,15 @@ | |||
| 1138 | 1114 | ||
| 1139 | 1115 | ## ECC | |
| 1140 | 1116 | ||
| 1141 | -The ECC unit includes the ECDSA (Elliptic Curve Digital Signature Algorithm) engine, offering a variant of the cryptographically secure Digital Signature Algorithm (DSA), which uses elliptic curve (ECC). A digital signature is an authentication method in which a public key pair and a digital certificate are used as a signature to verify the identity of a recipient or sender of information. | ||
| 1117 | +The ECC unit includes the ECDSA (Elliptic Curve Digital Signature Algorithm) engine and the ECDH (Elliptic Curve Diffie-Hellman Key-Exchange) engine, offering a variant of the cryptographically secure Digital Signature Algorithm (DSA) and Diffie-Hellman Key-Exchange (DH), which uses elliptic curve (ECC). A digital signature is an authentication method in which a public key pair and a digital certificate are used as a signature to verify the identity of a recipient or sender of information. | ||
| 1142 | 1118 | ||
| 1143 | 1119 | The hardware implementation supports deterministic ECDSA, 384 Bits (Prime Field), also known as NIST-Secp384r1, described in RFC6979. | |
| 1144 | 1120 | ||
| 1121 | +The hardware implementation also supports ECDH, 384 Bits (Prime Field), also known as NIST-Secp384r1, described in SP800-56A. | ||
| 1122 | + | ||
| 1145 | 1123 | Secp384r1 parameters are shown in the following figure. | |
| 1146 | 1124 | ||
| 1147 | -*Figure 31: Secp384r1 parameters* | ||
| 1125 | +*Figure 32: Secp384r1 parameters* | ||
| 1148 | 1126 | ||
| 1149 | 1127 |  | |
| 1150 | 1128 | ||
| @@ -1152,9 +1130,11 @@ | |||
| 1152 | 1130 | ||
| 1153 | 1131 | The ECDSA consists of three operations, shown in the following figure. | |
| 1154 | 1132 | ||
| 1155 | -*Figure 32: ECDSA operations* | ||
| 1133 | +*Figure 33: ECDSA operations* | ||
| 1156 | 1134 | ||
| 1157 | 1135 |  | |
| 1136 | + | ||
| 1137 | +The ECDH also consists of the sharedkey generation. | ||
| 1158 | 1138 | ||
| 1159 | 1139 | #### KeyGen | |
| 1160 | 1140 | ||
| @@ -1166,7 +1146,7 @@ | |||
| 1166 | 1146 | ||
| 1167 | 1147 | #### Signing | |
| 1168 | 1148 | ||
| 1169 | -In the signing algorithm, a signature (r, s) is generated by Sign(privKey, h), taking a privKey and hash of message m, h = hash(m), using a cryptographic hash function, SHA384. The signing algorithm includes: | ||
| 1149 | +In the signing algorithm, a signature (r, s) is generated by Sign(privKey, h), taking a privKey and hash of message m, h = hash(m), using a cryptographic hash function, SHA512. The signing algorithm includes: | ||
| 1170 | 1150 | ||
| 1171 | 1151 | * Generate a random number k in the range [1..n-1], while k = HMAC\_DRBG(privKey, h) | |
| 1172 | 1152 | * Calculate the random point R = k × G | |
| @@ -1176,24 +1156,32 @@ | |||
| 1176 | 1156 | ||
| 1177 | 1157 | #### Verifying | |
| 1178 | 1158 | ||
| 1179 | -The signature (r, s) can be verified by Verify(pubKey ,h ,r, s) considering the public key pubKey and hash of message m, h=hash(m) using the same cryptographic hash function SHA384. The output is r’ value of verifying a signature. The ECDSA verify algorithm includes: | ||
| 1159 | +The signature (r, s) can be verified by Verify(pubKey ,h ,r, s) considering the public key pubKey and hash of message m, h=hash(m) using the same cryptographic hash function SHA512. The output is r’ value of verifying a signature. The ECDSA verify algorithm includes: | ||
| 1180 | 1160 | ||
| 1181 | 1161 | * Calculate s1 = s<sup>−1</sup> mod n | |
| 1182 | 1162 | * Compute R' = (h × s1) × G + (r × s1) × pubKey | |
| 1183 | 1163 | * Take r’ = R'x mod n, while R'x is x coordinate of R’=(R'x, R'y) | |
| 1184 | 1164 | * Verify the signature by comparing whether r' == r | |
| 1185 | 1165 | ||
| 1166 | +#### ECDH sharedkey | ||
| 1167 | + | ||
| 1168 | +In ECDH sharedkey generation, the shared key is generated by ECDH_sharedkey(privKey_A, pubKey_B), taking an own prikey and other party pubkey. The ECDH sharedkey algorithm is as follows: | ||
| 1169 | + | ||
| 1170 | +* Compute P = sharedkey(privkey_A, pubkey_b) where P(x,y) is a point on ECC. | ||
| 1171 | +* Output sharedkey = Px, where Px is x coordinate of P. | ||
| 1172 | + | ||
| 1173 | + | ||
| 1186 | 1174 | ### Architecture | |
| 1187 | 1175 | ||
| 1188 | 1176 | The ECC top-level architecture is shown in the following figure. | |
| 1189 | 1177 | ||
| 1190 | -*Figure 33: ECDSA architecture* | ||
| 1191 | - | ||
| 1192 | - | ||
| 1178 | +*Figure 34: ECC architecture* | ||
| 1179 | + | ||
| 1180 | + | ||
| 1193 | 1181 | ||
| 1194 | 1182 | ### Signal descriptions | |
| 1195 | 1183 | ||
| 1196 | -The ECDSA architecture inputs and outputs are described in the following table. | ||
| 1184 | +The ECC architecture inputs and outputs are described in the following table. | ||
| 1197 | 1185 | ||
| 1198 | 1186 | ||
| 1199 | 1187 | | Name | Input or output | Description | | |
| @@ -1206,49 +1194,56 @@ | |||
| 1206 | 1194 | | nonce \[383:0\] | input | The deterministic nonce for HMAC_DRBG in the KeyGen operation. | | |
| 1207 | 1195 | | privKey_in\[383:0\] | input | The input private key used in the signing operation. | | |
| 1208 | 1196 | | pubKey_in\[1:0\]\[383:0\] | input | The input public key(x,y) used in the verifying operation. | | |
| 1209 | -| hashed_msg\[383:0\] | input | The hash of message using SHA384. | | ||
| 1197 | +| hashed_msg\[383:0\] | input | The hash of message using SHA512. | | ||
| 1210 | 1198 | | ready | output | When HIGH, the signal indicates the core is ready. | | |
| 1211 | 1199 | | privKey_out\[383:0\] | output | The generated private key in the KeyGen operation. | | |
| 1212 | 1200 | | pubKey_out\[1:0\]\[383:0\] | output | The generated public key(x,y) in the KeyGen operation. | | |
| 1213 | 1201 | | r\[383:0\] | output | The signature value of the given priveKey/message. | | |
| 1214 | 1202 | | s\[383:0\] | output | The signature value of the given priveKey/message. | | |
| 1215 | 1203 | | r’\[383:0\] | Output | The signature verification result. | | |
| 1204 | +| DH_sharedkey\[383:0\] | output | The generated shared key in the ECDH sharedkey operation. | | ||
| 1216 | 1205 | | valid | output | When HIGH, the signal indicates the result is ready. | | |
| 1217 | 1206 | ||
| 1218 | 1207 | ||
| 1219 | 1208 | ### Address map | |
| 1220 | 1209 | ||
| 1221 | -The ECDSA address map is shown here: [ecc\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.ecc_reg). | ||
| 1210 | +The ECC address map is shown here: [ecc\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.ecc_reg). | ||
| 1222 | 1211 | ||
| 1223 | 1212 | ### Pseudocode | |
| 1224 | 1213 | ||
| 1225 | -The following pseudocode blocks demonstrate example implementations for KeyGen, Signing, and Verifying. | ||
| 1214 | +The following pseudocode blocks demonstrate example implementations for KeyGen, Signing, Verifying, and ECDH sharedkey. | ||
| 1226 | 1215 | ||
| 1227 | 1216 | #### KeyGen | |
| 1228 | 1217 | ||
| 1229 | -*Figure 34: KeyGen pseudocode* | ||
| 1218 | +*Figure 35: KeyGen pseudocode* | ||
| 1230 | 1219 | ||
| 1231 | 1220 |  | |
| 1232 | 1221 | ||
| 1233 | 1222 | #### Signing | |
| 1234 | 1223 | ||
| 1235 | -*Figure 35: Signing pseudocode* | ||
| 1224 | +*Figure 36: Signing pseudocode* | ||
| 1236 | 1225 | ||
| 1237 | 1226 |  | |
| 1238 | 1227 | ||
| 1239 | 1228 | #### Verifying | |
| 1240 | 1229 | ||
| 1241 | -*Figure 36: Verifying pseudocode* | ||
| 1230 | +*Figure 37: Verifying pseudocode* | ||
| 1242 | 1231 | ||
| 1243 | 1232 |  | |
| 1244 | 1233 | ||
| 1234 | +#### ECDH sharedkey | ||
| 1235 | + | ||
| 1236 | +*Figure 38: ECDH sharedkey pseudocode* | ||
| 1237 | + | ||
| 1238 | + | ||
| 1239 | + | ||
| 1245 | 1240 | ### SCA countermeasure | |
| 1246 | 1241 | ||
| 1247 | -The described ECDSA has three main routines: KeyGen, Signing, and Verifying. Since the Verifying routine requires operation with public values rather than a secret value, our side-channel analysis does not cover this routine. Our evaluation covers the KeyGen and Signing routines where the secret values are processed. | ||
| 1248 | - | ||
| 1249 | -KeyGen consists of HMAC DRBG and scalar multiplication, while Signing first requires a message hashing and then follows the same operations as KeyGen (HMAC DRBG and scalar multiplication). The last step of Signing is generating “S” as the proof of signature. Since HMAC DRBG and hash operations are evaluated separately in our document, this evaluation covers scalar multiplication and modular arithmetic operations. | ||
| 1250 | - | ||
| 1251 | -#### Scalar multiplication | ||
| 1242 | +The described ECC has four main routines: KeyGen, Signing, Verifying, and ECDH sharedkey. Since the Verifying routine requires operation with public values rather than a secret value, our side-channel analysis does not cover this routine. Our evaluation covers the KeyGen, Signing, and ECDH sharedkey routines where the secret values are processed. | ||
| 1243 | + | ||
| 1244 | +KeyGen consists of HMAC DRBG and scalar multiplication, while Signing first requires a message hashing and then follows the same operations as KeyGen (HMAC DRBG and scalar multiplication). The last step of Signing is generating “S” as the proof of signature. Since HMAC DRBG and hash operations are evaluated separately in our document, this evaluation covers scalar multiplication and modular arithmetic operations. | ||
| 1245 | + | ||
| 1246 | +#### Scalar multiplication | ||
| 1252 | 1247 | ||
| 1253 | 1248 | To perform the scalar multiplication, the Montgomery ladder is implemented, which is inherently resistant to timing and single power analysis (SPA) attacks. | |
| 1254 | 1249 | ||
| @@ -1256,7 +1251,7 @@ | |||
| 1256 | 1251 | ||
| 1257 | 1252 | To protect the architecture against horizontal power/electromagnetic (EM) and differential power analysis (DPA) attacks, several countermeasures are embedded in the design [9]. Since these countermeasures require random inputs, HMAC-DRBG is fed by IV to generate these random values. | |
| 1258 | 1253 | ||
| 1259 | -Since HMAC-DRBG generates random value in a deterministic way, firmware MUST feed different IV to ECC engine for EACH keygen and signing operation. | ||
| 1254 | +Since HMAC-DRBG generates random value in a deterministic way, firmware MUST feed different IV to ECC engine for EACH keygen, signing, and ECDH sharedkey operation. | ||
| 1260 | 1255 | ||
| 1261 | 1256 | #### Base point randomization | |
| 1262 | 1257 | ||
| @@ -1284,7 +1279,7 @@ | |||
| 1284 | 1279 | ||
| 1285 | 1280 | Generating “S” as the proof of signature at the steps of the signing operation leaks where the hashed message is signed with private key and ephemeral key as follows: | |
| 1286 | 1281 | ||
| 1287 | -Since the given message is known or the signature part r is known, the attacker can perform a known-plaintext attack. The attacker can sign multiple messages with the same key, or the attacker can observe part of the signature that is generated with multiple messages but the same key. | ||
| 1282 | +Since the given message is known or the signature part r is known, the attacker can perform a known-plaintext attack. The attacker can sign multiple messages with the same key, or the attacker can observe part of the signature that is generated with multiple messages but the same key. | ||
| 1288 | 1283 | ||
| 1289 | 1284 | The evaluation shows that the CPA attack can be performed with a small number of traces, respectively. Thus, an arithmetic masked design for these operations is implemented. | |
| 1290 | 1285 | ||
| @@ -1292,7 +1287,7 @@ | |||
| 1292 | 1287 | ||
| 1293 | 1288 | This countermeasure is achieved by randomizing the privkey as follows: | |
| 1294 | 1289 | ||
| 1295 | -Although computation of “S” seems the most vulnerable point in our scheme, the operation does not have a big contribution to overall latency. Hence, masking these operations has low overhead on the cost of the design. | ||
| 1290 | +Although computation of “S” seems the most vulnerable point in our scheme, the operation does not have a big contribution to overall latency. Hence, masking these operations has low overhead on the cost of the design. | ||
| 1296 | 1291 | ||
| 1297 | 1292 | #### Random number generator for SCA countermeasure | |
| 1298 | 1293 | ||
| @@ -1304,7 +1299,7 @@ | |||
| 1304 | 1299 | 2. KEYGEN PRIVKEY: Running HMAC\_DRBG with seed and nonce to generate the privkey in KEYGEN operation. | |
| 1305 | 1300 | 3. SIGNING NONCE: Running HMAC\_DRBG based on RFC6979 in SIGNING operation with privkey and hashed\_msg. | |
| 1306 | 1301 | ||
| 1307 | -*Figure 37: HMAC\_DRBG utilization* | ||
| 1302 | +*Figure 39: HMAC\_DRBG utilization* | ||
| 1308 | 1303 | ||
| 1309 | 1304 |  | |
| 1310 | 1305 | ||
| @@ -1320,7 +1315,7 @@ | |||
| 1320 | 1315 | ||
| 1321 | 1316 | The data flow of the HMAC\_DRBG operation in keygen operation mode is shown in the following figure. | |
| 1322 | 1317 | ||
| 1323 | -*Figure 38: HMAC\_DRBG data flow* | ||
| 1318 | +*Figure 40: HMAC\_DRBG data flow* | ||
| 1324 | 1319 | ||
| 1325 | 1320 |  | |
| 1326 | 1321 | ||
| @@ -1330,7 +1325,7 @@ | |||
| 1330 | 1325 | ||
| 1331 | 1326 | In practice, observing a t-value greater than a specific threshold (mainly 4.5) indicates the presence of leakage. However, in ECC, due to its latency, around 5 million samples are required to be captured. This latency leads to many false positives and the TVLA threshold can be considered a higher value than 4.5. Based on the following figure from “Side-Channel Analysis and Countermeasure Design for Implementation of Curve448 on Cortex-M4” by Bisheh-Niasar et. al., the threshold can be considered equal to 7 in our case. | |
| 1332 | 1327 | ||
| 1333 | -*Figure 39: TVLA threshold as a function of the number of samples per trace* | ||
| 1328 | +*Figure 41: TVLA threshold as a function of the number of samples per trace* | ||
| 1334 | 1329 | ||
| 1335 | 1330 |  | |
| 1336 | 1331 | ||
| @@ -1340,7 +1335,7 @@ | |||
| 1340 | 1335 | The TVLA results for performing seed/nonce-dependent leakage detection using 200,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC keygen by changing the seed/nonce after 200,000 operations. | |
| 1341 | 1336 | ||
| 1342 | 1337 | ||
| 1343 | -*Figure 40: seed/nonce-dependent leakage detection using TVLA for ECC keygen after 200,000 traces* | ||
| 1338 | +*Figure 42: seed/nonce-dependent leakage detection using TVLA for ECC keygen after 200,000 traces* | ||
| 1344 | 1339 | ||
| 1345 | 1340 |  | |
| 1346 | 1341 | ||
| @@ -1348,13 +1343,13 @@ | |||
| 1348 | 1343 | ||
| 1349 | 1344 | The TVLA results for performing privkey-dependent leakage detection using 20,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC signing by changing the privkey after 20,000 operations. | |
| 1350 | 1345 | ||
| 1351 | -*Figure 41: privkey-dependent leakage detection using TVLA for ECC signing after 20,000 traces* | ||
| 1346 | +*Figure 43: privkey-dependent leakage detection using TVLA for ECC signing after 20,000 traces* | ||
| 1352 | 1347 | ||
| 1353 | 1348 |  | |
| 1354 | 1349 | ||
| 1355 | 1350 | The TVLA results for performing message-dependent leakage detection using 64,000 traces is shown in the following figure. Based on this figure, there is no leakage in ECC signing by changing the message after 64,000 operations. | |
| 1356 | 1351 | ||
| 1357 | -*Figure 42: Message-dependent leakage detection using TVLA for ECC signing after 64,000 traces* | ||
| 1352 | +*Figure 44: Message-dependent leakage detection using TVLA for ECC signing after 64,000 traces* | ||
| 1358 | 1353 | ||
| 1359 | 1354 |  | |
| 1360 | 1355 | ||
| @@ -1391,17 +1386,17 @@ | |||
| 1391 | 1386 | ||
| 1392 | 1387 | ## LMS Accelerator | |
| 1393 | 1388 | ||
| 1394 | -LMS cryptography is a type of hash-based digital signature scheme that was standardized by NIST in 2020. It is based on the Leighton-Micali Signature (LMS) system, which uses a Merkle tree structure to combine many one-time signature (OTS) keys into a single public key. LMS cryptography is resistant to quantum attacks and can achieve a high level of security without relying on large integer mathematics. | ||
| 1389 | +LMS cryptography is a type of hash-based digital signature scheme that was standardized by NIST in 2020. It is based on the Leighton-Micali Signature (LMS) system, which uses a Merkle tree structure to combine many one-time signature (OTS) keys into a single public key. LMS cryptography is resistant to quantum attacks and can achieve a high level of security without relying on large integer mathematics. | ||
| 1395 | 1390 | ||
| 1396 | 1391 | Caliptra supports only LMS verification using a software/hardware co-design approach. Hence, the LMS accelerator reuses the SHA256 engine to speedup the Winternitz chain by removing software-hardware interface overhead. The LMS-OTS verification algorithm is shown in follwoing figure: | |
| 1397 | 1392 | ||
| 1398 | -*Figure 43: LMS-OTS Verification algorithm* | ||
| 1393 | +*Figure 45: LMS-OTS Verification algorithm* | ||
| 1399 | 1394 | ||
| 1400 | 1395 |  | |
| 1401 | 1396 | ||
| 1402 | 1397 | The high-level architecture of LMS is shown in the following figure. | |
| 1403 | 1398 | ||
| 1404 | -*Figure 44: LMS high-level architecture* | ||
| 1399 | +*Figure 46: LMS high-level architecture* | ||
| 1405 | 1400 | ||
| 1406 | 1401 |  | |
| 1407 | 1402 | ||
| @@ -1426,7 +1421,7 @@ | |||
| 1426 | 1421 | ||
| 1427 | 1422 | The Winternitz hash chain can be accelerated in hardware to enhance the performance of the design. For that, a configurable architecture is proposed that can reuse SHA256 engine. The LMS accelerator architecture is shown in the following figure, while H is SHA256 engine. | |
| 1428 | 1423 | ||
| 1429 | -*Figure 45: Winternitz chain architecture* | ||
| 1424 | +*Figure 47: Winternitz chain architecture* | ||
| 1430 | 1425 | ||
| 1431 | 1426 |  | |
| 1432 | 1427 | ||
| @@ -1456,10 +1451,794 @@ | |||
| 1456 | 1451 | ||
| 1457 | 1452 | The address map for LMS accelerator integrated into SHA256 is shown here: [sha256\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.sha256_reg). | |
| 1458 | 1453 | ||
| 1454 | +## Adams Bridge - Dilithium (ML-DSA) | ||
| 1455 | + | ||
| 1456 | +Please refer to the [Adams-bridge specification](https://github.com/chipsalliance/adams-bridge/blob/main/docs/AdamsBridgeHardwareSpecification.md) | ||
| 1457 | + | ||
| 1458 | +### Address map | ||
| 1459 | +Address map of ML-DSA accelerator is shown here: [ML-DSA\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.mldsa_reg) | ||
| 1460 | + | ||
| 1461 | +## AES | ||
| 1462 | + | ||
| 1463 | +The AES unit is a cryptographic accelerator that processes requests from the processor to encrypt or decrypt 16-byte data blocks. It supports AES-128/192/256 in various modes, including Electronic Codebook (ECB), Cipher Block Chaining (CBC), Cipher Feedback (CFB) with a fixed segment size of 128 bits (CFB-128), Output Feedback (OFB), Counter (CTR), and Galois/Counter Mode (GCM). | ||
| 1464 | + | ||
| 1465 | +The AES unit is reused from here, (see [aes](https://github.com/lowRISC/opentitan/tree/master/hw/ip/aes) with a shim to translate from AHB-lite to the tl-ul interface. | ||
| 1466 | + | ||
| 1467 | +Additional registers have been added to support key vault integration. Keys from the key vault can be loaded into the AES unit to be used for encryption or decryption. | ||
| 1468 | + | ||
| 1469 | +### Operation | ||
| 1470 | + | ||
| 1471 | +For more information, see the [AES Programmer's Guide](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/programmers_guide.md). | ||
| 1472 | + | ||
| 1473 | +### Signal descriptions | ||
| 1474 | + | ||
| 1475 | +The AES architecture inputs and outputs are described in the following table. | ||
| 1476 | + | ||
| 1477 | +| Name | Input or output | Description | | ||
| 1478 | +| :--------------------------------- | :-------------- | :----------- | | ||
| 1479 | +| clk | input | All signal timings are related to the rising edge of clk. | | ||
| 1480 | +| reset_n | input | The reset signal is active LOW and resets the core. This is the only active LOW signal. | | ||
| 1481 | +| DATA_IN | input | Input block to be encrypted or decrypted. Written in four 32-bit registers. | | ||
| 1482 | +| DATA_OUT | output | Output block result of encryption or decryption. Stored in four 32-bit registers. | | ||
| 1483 | +| CTRL_SHADOWED.MANUAL_OPERATION | input | Configures the AES core to operation in manual mode. | | ||
| 1484 | +| CTRL_SHADOWED.PRNG_RESEED_RATE | input | Configures the rate of reseeding the internal PRNG used for masking. | | ||
| 1485 | +| CTRL_SHADOWED.SIDELOAD | input | When asserted, AES core will use the key from the keyvault interface. | | ||
| 1486 | +| CTRL_SHADOWED.KEY_LEN | input | Configures the AES key length. Supports 128, 192, and 256-bit keys. | | ||
| 1487 | +| CTRL_SHADOWED.MODE | input | Configures the AES block cipher mode. | | ||
| 1488 | +| CTRL_SHADOWED.OPERATION | input | Configures the AES core to operate in encryption or decryption modes. | | ||
| 1489 | +| CTRL_GCM_SHADOWED.PHASE | input | Configures the GCM phase. | | ||
| 1490 | +| CTRL_GCM_SHADOWED.NUM_VALID_BYTES | input | Configures the number of valid bytes of the current input block in GCM. | | ||
| 1491 | +| TRIGGER.PRNG_RESEED | input | Forces a PRNG reseed. | | ||
| 1492 | +| TRIGGER.DATA_OUT_CLEAR | input | Clears the DATA_OUT registers with pseudo-random data. | | ||
| 1493 | +| TRIGGER.KEY_IV_DATA_IN_CLEAR | input | Clears the Key, IV, and DATA_INT registers with pseudo-random data. | | ||
| 1494 | +| TRIGGER.START | input | Triggers the encryption/decryption of one data block if in manual operation mode. | | ||
| 1495 | +| STATUS.ALERT_FATAL_FAULT | output | A fatal fault has ocurred and the AES unit needs to be reset. | | ||
| 1496 | +| STATUS.ALERT_RECOV_CTRL_UPDATE_ERR | output | An update error has occurred in the shadowed Control Register. AES operation needs to be restarted by re-writing the Control Register. | | ||
| 1497 | +| STATUS.INPUT_READY | output | The AES unit is ready to receive new data input via the DATA_IN registers. | | ||
| 1498 | +| STATUS.OUTPUT_VALID | output | The AES unit has alid output data. | | ||
| 1499 | +| STATUS.OUTPUT_LOST | output | All previous output data has been fully read by the processor (0) or at least one previous output data block has been lost (1). It has been overwritten by the AES unit before the processor could fully read it. Once set to 1, this flag remains set until AES operation is restarted by re-writing the Control Register. The primary use of this flag is for design verification. This flag is not meaningful if MANUAL_OPERATION=0. | | ||
| 1500 | +| STATUS.STALL | output | The AES unit is stalled because there is previous output data that must be read by the processor before the AES unit can overwrite this data. This flag is not meaningful if MANUAL_OPERATION=1. | | ||
| 1501 | +| STATUS.IDLE | output | The AES unit is idle. | | ||
| 1502 | + | ||
| 1503 | + | ||
| 1504 | + | ||
| 1505 | +### Address map | ||
| 1506 | + | ||
| 1507 | +The AES address map is shown here: [aes\_clp\_reg — clp Reference (chipsalliance.github.io)](https://chipsalliance.github.io/caliptra-rtl/main/internal-regs/?p=clp.aes_clp_reg). | ||
| 1508 | + | ||
| 1509 | +### SCA countermeasures | ||
| 1510 | + | ||
| 1511 | +The AES unit employs separate SCA countermeasures for the AES cipher core used for the encryption/decryption part and for the GHASH module used for computing the integrity tag in GCM. | ||
| 1512 | + | ||
| 1513 | +### AES cipher core | ||
| 1514 | + | ||
| 1515 | +A detailed specification of the SCA countermeasure employed in the AES cipher core is shown here: [AES cipher core SCA countermeasure](https://opentitan.org/book/hw/ip/aes/doc/theory_of_operation.html#1st-order-masking-of-the-cipher-core). | ||
| 1516 | +The most critical building block of the SCA countermeasure, i.e., the masked AES S-Box, successfully passes formal masking verification at the netlist level using [Alma: Execution-aware Masking Verification](https://github.com/IAIK/coco-alma). | ||
| 1517 | +The flow required for repeating the formal masking verification using Alma together with a Howto can be found [here](https://github.com/lowRISC/opentitan/blob/master/hw/ip/aes/pre_sca/alma/README.md). | ||
| 1518 | +The entire AES cipher core including the masked S-Boxes and as well as the PRNG generating the randomness for remasking successfully passes masking evaluation at the netlist level using [PROLEAD - A Probing-Based Leakage Detection Tool for Hardware and Software](https://github.com/ChairImpSec/PROLEAD). | ||
| 1519 | +The flow required for repeating the masking evaluation using PROLEAD together with a Howto can be found [here](https://github.com/lowRISC/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/prolead/README.md). | ||
| 1520 | + | ||
| 1521 | +### GHASH module | ||
| 1522 | + | ||
| 1523 | +A detailed specification of the SCA countermeasure employed in the GHASH module is shown here: [GHASH module SCA countermeasure](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#1st-order-masking-of-the-ghash-module). | ||
| 1524 | + | ||
| 1525 | +To optimize and verify this masking countermeasure, two different types of experiments have been performed for which the results are given below. | ||
| 1526 | +1. Formal masking verification using [Alma: Execution-aware Masking Verification](https://github.com/IAIK/coco-alma). | ||
| 1527 | + These experiments led to a [series of small design optimizations](https://github.com/vogelpi/opentitan/pull/18) which have been integrated into Caliptra. | ||
| 1528 | + The resulting design successfully passes formal masking verification at the netlist level. | ||
| 1529 | +1. [Test-vector leakage assessment (TVLA)](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf) applied to power SCA traces captured on a ChipWhisperer-based FPGA setup. | ||
| 1530 | + These experiments confirm the formal masking verification results: | ||
| 1531 | + No 1st-order SCA can be observed during the GHASH operation. | ||
| 1532 | + The leakage observed at the boundary of and outside the GHASH operation can be attributed to the evaluation methodology and the handling of unmasked and uncritical data, as well as to FPGA-specific leakage effects known from literature. | ||
| 1533 | + We are confident that the optimized SCA hardening concept effectively deters SCA attacks. | ||
| 1534 | + | ||
| 1535 | +#### Formal masking verification using Alma | ||
| 1536 | + | ||
| 1537 | +[Alma](https://ieeexplore.ieee.org/document/9617707) is an open source, formal masking verification tool developed at TU Graz which enables formal verification of masking SCA countermeasures at the netlist level. | ||
| 1538 | +The main advantages of this approach compared to analyzing FPGA power traces are as follows: | ||
| 1539 | + | ||
| 1540 | +* The turn-around time is much faster as it does not involve FPGA bitstream generation and capturing power traces (both can take several hours). | ||
| 1541 | +* Netlist-based analysis tools typically enable pinpointing sources of SCA leakage and easily allow analyzing sub parts of the masked design individually. | ||
| 1542 | + As a result, individual issues can be fixed up faster. | ||
| 1543 | +* The analyzed netlist is closer to the targeted ASIC implementation. | ||
| 1544 | + During FPGA synthesis, the netlist is mapped to the logic elements such as look-up tables (LUTs) available on the selected FPGA which are fundamentally different from more simple ASIC gates. | ||
| 1545 | + | ||
| 1546 | +However, formal netlist analysis tools may not be perfect and they also have limitations in terms of what can be analyzed. | ||
| 1547 | +For example, the maximum supported netlist size depends on the complexity and number of the non-linear elements. | ||
| 1548 | +Also, random number generators and in particular pseudo-random number generators typically need to be excluded from the analysis and random number inputs need to be assumed as ideal by tools. | ||
| 1549 | +Thus, they don’t replace FPGA-based analysis. | ||
| 1550 | +We use them to increase our confidence in our SCA countermeasures and to close countermeasure verification faster by reducing the number of FPGA evaluation runs. | ||
| 1551 | + | ||
| 1552 | +##### Prerequisites | ||
| 1553 | + | ||
| 1554 | +The [Alma-based formal masking verification flow together with a Howto](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_sca/alma#readme) (including installation instructions) as well an [open source Yosys synthesis flow](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_syn) are available open soure. | ||
| 1555 | +The tool can both run on generic Yosys netlists or on proprietary and technology-specific netlists. | ||
| 1556 | +For the latter, a [slightly modified verification flow with an additional translation step](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_sca/alma_post_syn#readme) is required. | ||
| 1557 | +To verify the GHASH SCA countermeasure, the generic flow was used with the following tool versions: | ||
| 1558 | + | ||
| 1559 | +* Alma ([specific commit](https://github.com/vogelpi/coco-alma/commit/68e436f67dee7d27fb782864dc5523ceb4bd27bf)) | ||
| 1560 | +* Yosys 0.36 (git sha1 8f07a0d84) | ||
| 1561 | +* sv2v v0.0.11-28-g81d8225 | ||
| 1562 | +* Verilator 4.214 2021-10-17 rev v4.214 | ||
| 1563 | + | ||
| 1564 | +##### Yosys Netlist Synthesis | ||
| 1565 | + | ||
| 1566 | +Setup the [open source Yosys synthesis flow](https://github.com/vogelpi/opentitan/tree/aes-gcm-review/hw/ip/aes/pre_syn) by copying the [`syn_setup.example.sh`](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_syn/syn_setup.example.sh) file and renaming it to `syn_setup.sh`. | ||
| 1567 | +Change the `LR_SYNTH_TOP_MODULE` variable to `aes_ghash_wrap` and the `LR_SYNTH_CELL_LIBRARY_PATH` to the `NangateOpenCellLibrary_typical.lib` file in the folder where you installed the nangate45 library. | ||
| 1568 | + | ||
| 1569 | +Then, start the synthesis by executing | ||
| 1570 | + | ||
| 1571 | +```sh | ||
| 1572 | +./syn_yosys.sh | ||
| 1573 | +``` | ||
| 1574 | +This should produce output similar to what is shown below: | ||
| 1575 | + | ||
| 1576 | +``` | ||
| 1577 | +8. Printing statistics. | ||
| 1578 | + | ||
| 1579 | +=== aes_ghash_wrap === | ||
| 1580 | + | ||
| 1581 | + Number of wires: 24543 | ||
| 1582 | + Number of wire bits: 29339 | ||
| 1583 | + Number of public wires: 567 | ||
| 1584 | + Number of public wire bits: 5363 | ||
| 1585 | + Number of memories: 0 | ||
| 1586 | + Number of memory bits: 0 | ||
| 1587 | + Number of processes: 0 | ||
| 1588 | + Number of cells: 26214 | ||
| 1589 | + AND2_X1 1585 | ||
| 1590 | + AND3_X1 4 | ||
| 1591 | + AND4_X1 32 | ||
| 1592 | + AOI211_X1 58 | ||
| 1593 | + AOI21_X1 293 | ||
| 1594 | + AOI221_X1 215 | ||
| 1595 | + AOI22_X1 364 | ||
| 1596 | + DFFR_X1 1468 | ||
| 1597 | + DFFS_X1 5 | ||
| 1598 | + INV_X1 584 | ||
| 1599 | + MUX2_X1 1252 | ||
| 1600 | + NAND2_X1 1870 | ||
| 1601 | + NAND3_X1 128 | ||
| 1602 | + NAND4_X1 37 | ||
| 1603 | + NOR2_X1 7551 | ||
| 1604 | + NOR3_X1 445 | ||
| 1605 | + NOR4_X1 28 | ||
| 1606 | + OAI211_X1 98 | ||
| 1607 | + OAI21_X1 827 | ||
| 1608 | + OAI221_X1 3 | ||
| 1609 | + OAI22_X1 183 | ||
| 1610 | + OR2_X1 28 | ||
| 1611 | + OR3_X1 67 | ||
| 1612 | + OR4_X1 2 | ||
| 1613 | + XNOR2_X1 7122 | ||
| 1614 | + XOR2_X1 1965 | ||
| 1615 | + | ||
| 1616 | + Chip area for module '\aes_ghash_wrap': 37534.728000 | ||
| 1617 | + | ||
| 1618 | +====== End Yosys Stat Report ====== | ||
| 1619 | + | ||
| 1620 | +Warnings: 20 unique messages, 102 total | ||
| 1621 | + | ||
| 1622 | +End of script. Logfile hash: 16c4d13569, CPU: user 25.11s system 0.12s, MEM: 176.29 MB peak | ||
| 1623 | +Yosys 0.36 (git sha1 8f07a0d84, gcc 11.4.0-1ubuntu1~22.04 -fPIC -Os) | ||
| 1624 | +Time spent: 66% 2x abc (47 sec), 9% 40x opt_expr (6 sec), ... | ||
| 1625 | +Area in kGE = 47.04 | ||
| 1626 | +``` | ||
| 1627 | + | ||
| 1628 | +Note that the reported area is quite a bit bigger compared to the number reported in the [GHASH SCA countermeasure specification](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#1st-order-masking-of-the-ghash-module) | ||
| 1629 | +The reasons are twofold: | ||
| 1630 | + | ||
| 1631 | +1. The `aes_ghash_wrap` module synthesized is a wrapper module around the GHASH module in focus of this analysis. | ||
| 1632 | + The goal of the wrapper is to separately feed in secrets (the hash subkey H and the encrypted initial counter block S) as well as randomness in a tool aware manner. | ||
| 1633 | + As such, the wrapper includes some additional muxing resources and a counter to ease interpretation of results. | ||
| 1634 | +2. To speed up the formal analysis, the pipelined Galois-field multipliers have been instantiated with a latency of 4 instead of 32 clock cycles as on FPGA. | ||
| 1635 | + While the latency or more precisely the processing parallelism does have an impact on the SNR, it does not have an impact on the formal netlist analysis which is performed in a so-to-say noise free environment. | ||
| 1636 | + | ||
| 1637 | +##### Formal Netlist Analysis | ||
| 1638 | + | ||
| 1639 | +After synthesizing the netlist, the following steps should be taken to perform the analysis: | ||
| 1640 | + | ||
| 1641 | +1. Make sure to source the `build_consts.sh` script | ||
| 1642 | + ```sh | ||
| 1643 | + source util/build_consts.sh | ||
| 1644 | + ``` | ||
| 1645 | + in order to set up some shell variables. | ||
| 1646 | + | ||
| 1647 | +1. Enter the directory where you have downloaded Alma and load the virtual Python environment | ||
| 1648 | + ```sh | ||
| 1649 | + source dev/bin/activate | ||
| 1650 | + ``` | ||
| 1651 | + | ||
| 1652 | +1. Launch the Alma tool to parse, trace (simulate) and formally verify the netlist. | ||
| 1653 | + For simplicity, a single script is provided to launch all the required steps with a single command. | ||
| 1654 | + Simply run | ||
| 1655 | + ```sh | ||
| 1656 | + ${REPO_TOP}/hw/ip/aes/pre_sca/alma/verify_aes_ghash.sh | ||
| 1657 | + ``` | ||
| 1658 | + This should produce output similar to the one below: | ||
| 1659 | + ``` | ||
| 1660 | + Verifying aes_ghash_wrap using Alma | ||
| 1661 | + Starting yosys synthesis... | ||
| 1662 | + | CircuitGraph | Total: 29882 | Linear: 9091 | Non-linear: 12741 | Registers: 1473 | Mux: 3538 | | ||
| 1663 | + parse.py successful (47.99s) | ||
| 1664 | + 1: Running verilator on given netlist | ||
| 1665 | + 2: Compiling verilated netlist library | ||
| 1666 | + 3: Compiling provided verilator testbench | ||
| 1667 | + 4: Simulating circuit and generating VCD | ||
| 1668 | + | CircuitGraph | Total: 29882 | Linear: 9091 | Non-linear: 12741 | Registers: 1473 | Mux: 3538 | | ||
| 1669 | + tmp/tmp.vcd:24765: [WARNING] Entry for name alert_fatal_i already exists in namemap (alert_fatal_i -> Ce") | ||
| 1670 | + tmp/tmp.vcd:24766: [WARNING] Entry for name alert_o already exists in namemap (alert_o -> De") | ||
| 1671 | + tmp/tmp.vcd:24767: [WARNING] Entry for name clear_i already exists in namemap (clear_i -> Ee") | ||
| 1672 | + tmp/tmp.vcd:24768: [WARNING] Entry for name clk_i already exists in namemap (clk_i -> Fe") | ||
| 1673 | + tmp/tmp.vcd:24770: [WARNING] Entry for name cyc_ctr_o already exists in namemap (cyc_ctr_o -> Ge") | ||
| 1674 | + tmp/tmp.vcd:24771: [WARNING] Entry for name data_in_prev_i already exists in namemap (data_in_prev_i -> He") | ||
| 1675 | + tmp/tmp.vcd:24772: [WARNING] Entry for name data_out_i already exists in namemap (data_out_i -> Le") | ||
| 1676 | + tmp/tmp.vcd:24773: [WARNING] Entry for name first_block_o already exists in namemap (first_block_o -> Pe") | ||
| 1677 | + tmp/tmp.vcd:24774: [WARNING] Entry for name gcm_phase_i already exists in namemap (gcm_phase_i -> Qe") | ||
| 1678 | + tmp/tmp.vcd:24775: [WARNING] Entry for name ghash_state_done_o already exists in namemap (ghash_state_done_o -> Re") | ||
| 1679 | + tmp/tmp.vcd:24776: [WARNING] Entry for name hash_subkey_i already exists in namemap (hash_subkey_i -> Ve") | ||
| 1680 | + tmp/tmp.vcd:24777: [WARNING] Entry for name in_ready_o already exists in namemap (in_ready_o -> ^e") | ||
| 1681 | + tmp/tmp.vcd:24778: [WARNING] Entry for name in_valid_i already exists in namemap (in_valid_i -> _e") | ||
| 1682 | + tmp/tmp.vcd:24779: [WARNING] Entry for name load_hash_subkey_i already exists in namemap (load_hash_subkey_i -> `e") | ||
| 1683 | + tmp/tmp.vcd:24780: [WARNING] Entry for name num_valid_bytes_i already exists in namemap (num_valid_bytes_i -> ae") | ||
| 1684 | + tmp/tmp.vcd:24781: [WARNING] Entry for name op_i already exists in namemap (op_i -> be") | ||
| 1685 | + tmp/tmp.vcd:24782: [WARNING] Entry for name out_ready_i already exists in namemap (out_ready_i -> ce") | ||
| 1686 | + tmp/tmp.vcd:24783: [WARNING] Entry for name out_valid_o already exists in namemap (out_valid_o -> de") | ||
| 1687 | + tmp/tmp.vcd:24784: [WARNING] Entry for name prd_i already exists in namemap (prd_i -> ee") | ||
| 1688 | + tmp/tmp.vcd:24785: [WARNING] Entry for name rst_ni already exists in namemap (rst_ni -> me") | ||
| 1689 | + tmp/tmp.vcd:24786: [WARNING] Entry for name s_i already exists in namemap (s_i -> ne") | ||
| 1690 | + 0 | ||
| 1691 | + 0 | ||
| 1692 | + Building formula for cycle 0: vars 0 clauses 0 | ||
| 1693 | + Checking cycle 0: | ||
| 1694 | + Building formula for cycle 1: vars 1024 clauses 1536 | ||
| 1695 | + Checking cycle 1: | ||
| 1696 | + Building formula for cycle 2: vars 3968 clauses 6528 | ||
| 1697 | + Checking cycle 2: | ||
| 1698 | + Building formula for cycle 3: vars 6298 clauses 11026 | ||
| 1699 | + Checking cycle 3: | ||
| 1700 | + Building formula for cycle 4: vars 14888 clauses 34886 | ||
| 1701 | + Checking cycle 4: | ||
| 1702 | + Building formula for cycle 5: vars 20924 clauses 52734 | ||
| 1703 | + Checking cycle 5: | ||
| 1704 | + Building formula for cycle 6: vars 53986 clauses 143674 | ||
| 1705 | + Checking cycle 6: | ||
| 1706 | + Building formula for cycle 7: vars 57570 clauses 150970 | ||
| 1707 | + Checking cycle 7: | ||
| 1708 | + Building formula for cycle 8: vars 80484 clauses 169282 | ||
| 1709 | + Checking cycle 8: | ||
| 1710 | + Building formula for cycle 9: vars 213770 clauses 504198 | ||
| 1711 | + Checking cycle 9: | ||
| 1712 | + Building formula for cycle 10: vars 594390 clauses 1617276 | ||
| 1713 | + Checking cycle 10: | ||
| 1714 | + Building formula for cycle 11: vars 1024018 clauses 2881744 | ||
| 1715 | + Checking cycle 11: | ||
| 1716 | + Building formula for cycle 12: vars 1704424 clauses 4910342 | ||
| 1717 | + Checking cycle 12: | ||
| 1718 | + Building formula for cycle 13: vars 1713897 clauses 4915466 | ||
| 1719 | + Checking cycle 13: | ||
| 1720 | + Building formula for cycle 14: vars 1834911 clauses 5233038 | ||
| 1721 | + Checking cycle 14: | ||
| 1722 | + Building formula for cycle 15: vars 2258841 clauses 6492446 | ||
| 1723 | + Checking cycle 15: | ||
| 1724 | + Building formula for cycle 16: vars 2734646 clauses 7907830 | ||
| 1725 | + Checking cycle 16: | ||
| 1726 | + Building formula for cycle 17: vars 5868600 clauses 18374416 | ||
| 1727 | + Checking cycle 17: | ||
| 1728 | + Building formula for cycle 18: vars 5922747 clauses 18524578 | ||
| 1729 | + Checking cycle 18: | ||
| 1730 | + Building formula for cycle 19: vars 6100898 clauses 19061808 | ||
| 1731 | + Checking cycle 19: | ||
| 1732 | + Building formula for cycle 20: vars 6427297 clauses 20074334 | ||
| 1733 | + Checking cycle 20: | ||
| 1734 | + Building formula for cycle 21: vars 6949506 clauses 21693947 | ||
| 1735 | + Checking cycle 21: | ||
| 1736 | + Building formula for cycle 22: vars 6949506 clauses 21693947 | ||
| 1737 | + Checking cycle 22: | ||
| 1738 | + Building formula for cycle 23: vars 6949506 clauses 21693947 | ||
| 1739 | + Checking cycle 23: | ||
| 1740 | + Building formula for cycle 24: vars 7057992 clauses 21994175 | ||
| 1741 | + Checking cycle 24: | ||
| 1742 | + Building formula for cycle 25: vars 7407412 clauses 23047989 | ||
| 1743 | + Checking cycle 25: | ||
| 1744 | + Building formula for cycle 26: vars 7797810 clauses 24221073 | ||
| 1745 | + Checking cycle 26: | ||
| 1746 | + Building formula for cycle 27: vars 10939700 clauses 34732235 | ||
| 1747 | + Checking cycle 27: | ||
| 1748 | + Building formula for cycle 28: vars 11268148 clauses 35780811 | ||
| 1749 | + Checking cycle 28: | ||
| 1750 | + Building formula for cycle 29: vars 11268148 clauses 35780811 | ||
| 1751 | + Checking cycle 29: | ||
| 1752 | + Building formula for cycle 30: vars 11268148 clauses 35780811 | ||
| 1753 | + Checking cycle 30: | ||
| 1754 | + Building formula for cycle 31: vars 11376634 clauses 36081039 | ||
| 1755 | + Checking cycle 31: | ||
| 1756 | + Building formula for cycle 32: vars 11726054 clauses 37134853 | ||
| 1757 | + Checking cycle 32: | ||
| 1758 | + Building formula for cycle 33: vars 12116452 clauses 38307937 | ||
| 1759 | + Checking cycle 33: | ||
| 1760 | + Building formula for cycle 34: vars 15258342 clauses 48819099 | ||
| 1761 | + Checking cycle 34: | ||
| 1762 | + Building formula for cycle 35: vars 15586534 clauses 49867675 | ||
| 1763 | + Checking cycle 35: | ||
| 1764 | + Building formula for cycle 36: vars 15619430 clauses 49965979 | ||
| 1765 | + Checking cycle 36: | ||
| 1766 | + Finished in 3948.52 | ||
| 1767 | + The execution is secure | ||
| 1768 | + ``` | ||
| 1769 | + | ||
| 1770 | +Notes: | ||
| 1771 | + | ||
| 1772 | +* This analysis exercises the full data path of the GHASH block and comprises the following operations (controlled by a small [Verilator testbench](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/alma/cpp/verilator_tb_aes_ghash_wrap.cpp)): | ||
| 1773 | + + Initial clearing of all internal registers. | ||
| 1774 | + + Loading the hash subkey H. | ||
| 1775 | + + Loading the encrypted initial counter block S including the subsequent generation of repeatedly used correction terms. | ||
| 1776 | + + Processing a first AAD/ciphertext block including the generation of a correction term that is used for the first block only. | ||
| 1777 | + + Processing a second AAD/ciphertext block. | ||
| 1778 | + + Producing the final authentication tag. | ||
| 1779 | + | ||
| 1780 | +* The [following main changes have been implemented as a result of the formal netlist analysis using Alma](https://github.com/vogelpi/opentitan/commit/ac9333116cbe65fa6b868fe02cb17344d1e2717f) (refer to the [countermeasure spec](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/doc/theory_of_operation.md#mapping-the-masked-algorithm-to-the-hardware) for details): | ||
| 1781 | + + The result of the final addition of Share 1 of S and the unmasked GHASH state is no longer stored into the GHASH state register but directly forwarded to the output, and the state input to this addition is blanked. | ||
| 1782 | + The input multiplexer (`ghash_in_mux`) loses one input. | ||
| 1783 | + + The two 3-input multiplexers selecting the operands for the addition with the GHASH state (`add_in_mux`) are replaced by one-hot multiplexers with registered control signals. | ||
| 1784 | + + The Operand B inputs of both GF multipliers are now blanked. | ||
| 1785 | + The 3-input multiplexer selecting Operand B of the second GF multiplier is replaced by a one-hot multiplexer with registered control signal. | ||
| 1786 | + In addition, the last input slice of Operand B for this multiplier is registered. | ||
| 1787 | + This allows the switching the multiplexer during the last clock cycle of the multiplication to avoid some undesirable transient leakage occurring upon saving the result of the multiplication into the GHASH state register (and this new value propagating through the multiplexer into the multiplier again). | ||
| 1788 | + + The GF multipliers are configured to output zero instead of Operand A (the hash subkey) while busy. | ||
| 1789 | + + The state input for the addition required for the generation of the correction term for Share 0 is blanked. | ||
| 1790 | + + Between adding the correction terms to the GHASH state for the last time and between unmasking the GHASH state, a bubble cycle is added to allow signals to fully settle thereby avoiding undesirable transient effects unmasking the uncorrected state shares. | ||
| 1791 | +* The overall area impact of these changes is low (+0.16 kGE in Yosys + nangate45). | ||
| 1792 | +* The final design successfully passes the formal masking verification. | ||
| 1793 | + For details regarding tool parameters, check the [analysis script](https://github.com/vogelpi/opentitan/blob/aes-gcm-review/hw/ip/aes/pre_sca/alma/verify_aes_ghash.sh). | ||
| 1794 | + | ||
| 1795 | +#### ChipWhisperer-based FPGA evaluation and TVLA | ||
| 1796 | + | ||
| 1797 | +To underpin the results of the formal verification flow, the hardening of the GHASH module has been analyzed on the ChipWhisperer [CW310](https://rtfm.newae.com/Targets/CW310%20Bergen%20Board/) FPGA board. | ||
| 1798 | +For this analysis, power traces with the ChipWhisperer [Husky](https://rtfm.newae.com/Capture/ChipWhisperer-Husky/) scope were captured during GCM operations. | ||
| 1799 | +Afterwards a Test Vector Leakage Assessment (TVLA) with the [ot-sca toolset](https://github.com/lowRISC/ot-sca) has been performed. | ||
| 1800 | +The setup is illustrated in Figure 1. | ||
| 1801 | + | ||
| 1802 | + | ||
| 1803 | +:--: | ||
| 1804 | +**Figure 1**: Target CW310 FPGA board (left) and the CW Husky scope (right). | ||
| 1805 | + | ||
| 1806 | +##### Setup | ||
| 1807 | + | ||
| 1808 | + | ||
| 1809 | +:--: | ||
| 1810 | +**Figure 2**: Measurement setup. The main components are the target board, the scope, and the SCA framework. | ||
| 1811 | + | ||
| 1812 | +Figure 2 gives a detailed overview of the measurement setup that has been utilized to capture the power traces. | ||
| 1813 | +The SCA evaluation framework ot-sca is the central component of the measurement setup. | ||
| 1814 | +It is responsible for communicating with the penetration testing framework that runs on the target FPGA board and with the scope. | ||
| 1815 | +Initially, ot-sca configures the scope (sample rate, number of samples) and the pentest framework (which input, how many encryptions, where to trigger). | ||
| 1816 | + | ||
| 1817 | +Based on the configuration, the pentest framework generates the cipher input, starts the encryption, and sends back the computed tag to ot-sca. | ||
| 1818 | +The trigger is automatically set and unset by the AES hardware block to achieve an accurate & constant trigger window. | ||
| 1819 | +In parallel, the scope waits for the trigger, captures the power consumption, and transfers the traces to the SCA evaluation framework. | ||
| 1820 | +The ot-sca framework stores the trace as well as the cipher configuration in a database. | ||
| 1821 | + | ||
| 1822 | + | ||
| 1823 | +:--: | ||
| 1824 | +**Figure 3**: Power trace with AES encryption rounds visible (*left*). Aligned traces when zooming in (*right*). | ||
| 1825 | + | ||
| 1826 | +Figure 3 depicts power traces captured during AES-GCM encryptions with the setup above. | ||
| 1827 | +As shown in the figure, the traces are nicely aligned, allowing to perform a sound evaluation. | ||
| 1828 | + | ||
| 1829 | +##### Methodology | ||
| 1830 | + | ||
| 1831 | +To detect whether the hardened GHASH implementation effectively mitigates SCA attacks, the Test Vector Leakage Assessment (TVLA) approach discussed by Rambus in a [whitepaper](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf) is adapted for the GCM mode of AES. | ||
| 1832 | +In TVLA, Welch’s *t*-test is used to determine whether it is possible to statistically distinguish two power trace sets from each other. | ||
| 1833 | +This test returns a value *t* for each sample, where a value of |*t*| > 4.5 means that, with a high probability, a data dependent leakage was detected. | ||
| 1834 | +However, note that this test cannot provide any information whether the leakage is actually exploitable. | ||
| 1835 | + | ||
| 1836 | + | ||
| 1837 | +:-- | ||
| 1838 | +**Figure 4:** TVLA plot showing leakage at around sample 1000. When increasing the number of traces (from 1000 to 10000), the leakage becomes more present. Note that the traces shown in this plot are taken from an arbitrary cryptographic hardware block and not AES. | ||
| 1839 | + | ||
| 1840 | +Figure 4 shows a TVLA plot that will be used throughout this document. The red lines mark the ± *t*-test border. | ||
| 1841 | + | ||
| 1842 | +###### Dataset Generation for FvsR IV & Key | ||
| 1843 | + | ||
| 1844 | +In TVLA, two different trace data sets need to be recorded. | ||
| 1845 | +As described in the [whitepaper](https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf), we generate these two trace data sets by using a fixed and a random AES-GCM cipher input set, *i.e.,* the fixed and the random set. | ||
| 1846 | + | ||
| 1847 | +| **Input** | **Fixed Set** | **Random Set** | | ||
| 1848 | +| --- | --- | --- | | ||
| 1849 | +| **Key** | STATIC | RANDOM | | ||
| 1850 | +| **IV** | STATIC | RANDOM | | ||
| 1851 | +| **PTX** | STATIC | STATIC | | ||
| 1852 | +| **AAD** | STATIC | STATIC | | ||
| 1853 | + | ||
| 1854 | + | ||
| 1855 | +As shown in the table above, for our experiment we use a static cipher input for the fixed set. | ||
| 1856 | +For the random set, we use a PRNG to randomly generate the secrets, *i.e.,* key and IV, for each encryption. | ||
| 1857 | +The dataset is generated directly on the device in the pentest framework. | ||
| 1858 | +For each trace, ot-sca stores information to which dataset the trace belongs to. | ||
| 1859 | + | ||
| 1860 | +With TVLA, the idea is to check whether we are able to distinguish power traces from the fixed and the random set. | ||
| 1861 | + | ||
| 1862 | +###### Dataset Generation for FvsR PTX & AAD | ||
| 1863 | + | ||
| 1864 | +For the second experiment, we use a static IV and key and calculate a FvsR PTX and AAD set: | ||
| 1865 | + | ||
| 1866 | +| **Input** | **Fixed Set** | **Random Set** | | ||
| 1867 | +| --- | --- | --- | | ||
| 1868 | +| **Key** | STATIC | STATIC | | ||
| 1869 | +| **IV** | STATIC | STATIC | | ||
| 1870 | +| **PTX** | STATIC | RANDOM | | ||
| 1871 | +| **AAD** | STATIC | RANDOM | | ||
| 1872 | + | ||
| 1873 | + | ||
| 1874 | +##### Results – FvsR IV & Key | ||
| 1875 | + | ||
| 1876 | +In the following, we discuss the analysis results for each GCM phase. | ||
| 1877 | +We start with the results for the FvsR IV & Key datasets. | ||
| 1878 | + | ||
| 1879 | + | ||
| 1880 | +:--: | ||
| 1881 | +**Figure 5:** AES-GCM block diagram. Red lines mark the trigger windows for each analysis step. | ||
| 1882 | + | ||
| 1883 | +As shown in Figure 5, we focus on analyzing (*i*) the generation of the hash subkey H, (*ii*) the encryption of the initial counter block S, (*iii*) the processing of the AAD blocks, (*iv*) the plaintext blocks, and (*v*) the tag generation. Each measurement is conducted with (*a*) masks off and (*b*) masks on to analyze the effectiveness of the masking countermeasure. | ||
| 1884 | + | ||
| 1885 | +###### i) SCA Evaluation of Generating the Hash Subkey H | ||
| 1886 | + | ||
| 1887 | + | ||
| 1888 | +:--: | ||
| 1889 | + | ||
| 1890 | +| **Figure 6a:** Masking Off - 100k traces - **Figure 6b:** Masking On - 1M traces | | ||
| 1891 | + | ||
| 1892 | + | ||
| 1893 | +###### Interpretation | ||
| 1894 | + | ||
| 1895 | +The AES encryption is clearly visible in the form of 12 distinct peaks in the power traces shown Figures 6a and 6b. | ||
| 1896 | +The 12 peaks correspond to first the loading of the key and the all-zero block into the AES cipher core, followed by the initial round and the 10 full AES rounds (AES-128). | ||
| 1897 | +They spread over approximately 470 samples which corresponds to the 56 target clock cycles a full AES-128 encryption takes. | ||
| 1898 | + | ||
| 1899 | +If the masking is turned off (Figure 6a), first and second-order leakage is clearly visible throughout the operation. | ||
| 1900 | +If the masking is on (Figure 6b), there is first-order leakage 1) at the beginning as well as 2) at the end of the operation. | ||
| 1901 | + | ||
| 1902 | +1. The leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds. | ||
| 1903 | + This produces first-order leakage as the inc32 function implementation isn’t masked. | ||
| 1904 | + It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM. | ||
| 1905 | +2. The leakage at the end of the operation happens when the masked output of the AES cipher core, i.e., the masked hash subkey H, gets loaded in shares into the GHASH block. | ||
| 1906 | + When studying the RTL, one can see that there is nothing in the path between the AES cipher core and the hash subkey registers inside the GHASH block that could combine the shares and cause this leakage. | ||
| 1907 | + The leakage is most likely due to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices: if flip flops of the different shares get mapped to the same logic slice, the carry-chain and other muxing logic present in the logic slice can combine the various inputs thereby causing SCA leakage despite these logic outputs not being used. | ||
| 1908 | + We’ve observed similar effects in the past and there is [research giving more insight into this and other FPGA-specific issues](https://ieeexplore.ieee.org/document/10545383). | ||
| 1909 | + | ||
| 1910 | +To summarize, the observed first-order leakage if masking is on (Figure 6b) is not of concern for ASIC implementations. | ||
| 1911 | + | ||
| 1912 | +###### ii) SCA Evaluation of Encrypting the Initial Counter Block | ||
| 1913 | + | ||
| 1914 | + | ||
| 1915 | +:--: | ||
| 1916 | + | ||
| 1917 | +| **Figure 7a:** Masking Off - 100k traces - **Figure 7b:** Masking On - 1M traces | | ||
| 1918 | + | ||
| 1919 | + | ||
| 1920 | +###### Interpretation | ||
| 1921 | + | ||
| 1922 | +Again, the AES encryption is clearly visible in the form of 12 peaks in the power traces shown Figures 7a and 7b. | ||
| 1923 | +This AES encryption corresponds to the generation of the encrypted initial counter block S. | ||
| 1924 | +The AES encryption is followed by another operation visible in the power trace: the computation of repeatedly used correction terms using the Galois-field multipliers inside GHASH. | ||
| 1925 | +This operation takes 33 target clock cycles (approximately 275 samples). | ||
| 1926 | + | ||
| 1927 | +If the masking is turned off (Figure 7a), first and second-order leakage is clearly visible throughout both operations while being more pronounced during the GHASH operation. | ||
| 1928 | +This is because the GHASH block is smaller and thus produces less noise. | ||
| 1929 | +If the masking is on (Figure 7b), there is first-order leakage 1) at the beginning as well as 2) between the two operations. | ||
| 1930 | + | ||
| 1931 | +1. As before, the leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds. | ||
| 1932 | + This produces first-order leakage as the inc32 function implementation isn’t masked. | ||
| 1933 | + It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM. | ||
| 1934 | +2. As before, the leakage at the end of the operation happens when the masked output of the AES cipher core, i.e., the encrypted initial counter block gets loaded in shares into the GHASH block. | ||
| 1935 | + When studying the RTL, one can see that there is nothing in the path between the AES cipher core and the GHASH state registers inside the GHASH block that could combine the shares and cause this leakage. | ||
| 1936 | + As before, the leakage is most likely due to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices: Since the multiplexers for both shares use the same control signals, the multiplexing logic can be combined even into the same look-up tables (LUTs) thereby causing SCA leakage. | ||
| 1937 | + We’ve observed similar effects in the past and there is [research giving more insight into this and other FPGA-specific issues](https://ieeexplore.ieee.org/document/10545383). | ||
| 1938 | + | ||
| 1939 | +To summarize, the observed first-order leakage if masking is on (FIgure 7b) is not of concern for ASIC implementations. | ||
| 1940 | + | ||
| 1941 | +###### iii) SCA Evaluation of Processing the AAD Blocks | ||
| 1942 | + | ||
| 1943 | +###### Processing AAD Block 0 | ||
| 1944 | + | ||
| 1945 | + | ||
| 1946 | +:--: | ||
| 1947 | + | ||
| 1948 | +| **Figure 8a:** Masking Off - 50k traces - **Figure 8b:** Masking On - 10M traces | | ||
| 1949 | + | ||
| 1950 | + | ||
| 1951 | +###### Interpretation | ||
| 1952 | + | ||
| 1953 | +For AAD blocks, the AES cipher core is not involved. | ||
| 1954 | +However, during the computation of the first AAD block, the GHASH block needs to compute an additional correction term which is used for the very first block only. | ||
| 1955 | +If the masking is turned off (Figure 8a), first- and second-order leakage is clearly visible but only for the first activity block. | ||
| 1956 | +The second activity block involves computing the additional correction terms which requires Share 1 of the encrypted initial counter block to be multiplied by Share 1 of the hash subkey. | ||
| 1957 | +But since the masking is off, both these values are zero for both the fixed and the random set and hence there is no SCA leakage. | ||
| 1958 | +If the masking is turned on (Figure 8b), no SCA leakage is observable which is desirable. | ||
| 1959 | + | ||
| 1960 | +###### Processing AAD Block 1 | ||
| 1961 | + | ||
| 1962 | + | ||
| 1963 | +:--: | ||
| 1964 | + | ||
| 1965 | +| **Figure 9a:** Masking Off - 50k traces - **Figure 9b:** Masking On - 10M traces | | ||
| 1966 | + | ||
| 1967 | + | ||
| 1968 | +###### Interpretation | ||
| 1969 | + | ||
| 1970 | +For the second AAD block (and any subsequent AAD blocks) there is only one activity block corresponding to the Galois-field multiplication. | ||
| 1971 | +If masking is turned off (Figure 9a), there is both first- and second-order leakage observable. | ||
| 1972 | +If the masking is turned on (Figure 9b), no SCA leakage is observable which is desirable. | ||
| 1973 | + | ||
| 1974 | +###### iv) SCA Evaluation of Processing the PTX Blocks | ||
| 1975 | + | ||
| 1976 | +###### Processing PTX Block 0 | ||
| 1977 | + | ||
| 1978 | + | ||
| 1979 | +:--: | ||
| 1980 | + | ||
| 1981 | +| **Figure 10a:** Masking Off - 50k traces - **Figure 10b:** Masking On - 1M traces | | ||
| 1982 | + | ||
| 1983 | + | ||
| 1984 | +###### Interpretation | ||
| 1985 | + | ||
| 1986 | +Like in [ii) SCA Evaluation of Encrypting the Initial Counter Block](#ii-sca-evaluation-of-encrypting-the-initial-counter-block) there is first-order leakage 1) at the beginning and 2) between the two operations if the masking is turned on (Figure 10b). | ||
| 1987 | + | ||
| 1988 | +1. As before, the leakage at the beginning of the operation is due to incrementing the IV/CTR value (inc32 function in GCM spec) which spreads across the first two AES rounds. | ||
| 1989 | + This produces first-order leakage as the inc32 function implementation isn’t masked. | ||
| 1990 | + It doesn’t need to be masked as the IV is not secret, just the encrypted initial counter block S (i.e., the encrypted IV) is secret in the context of GCM. | ||
| 1991 | +2. The leakage between the two operations is due to the unmasking of the AES cipher core output, the addition of input data to produce the ciphertext, and writing this value to the GHASH block and the output data registers. | ||
| 1992 | + It’s not related to the hash subkey H or the initial counter block S (i.e. the two secrets involved in the GHASH part of GCM). | ||
| 1993 | + But since the AAD and the plaintext have been chosen to be the same for all traces in the fixed and the random sets, the traces of the fixed set only produce all the same ciphertext and thus are expected to exhibit a static power signature for this step, whereas the ciphertext of the random set is randomized through the random key and IV. | ||
| 1994 | + However, since the ciphertext is not secret in the context of GCM, this leakage is of no concern. | ||
| 1995 | + | ||
| 1996 | +To summarize, the observed first-order leakage if masking is on (FIgure 10b) is not of concern. | ||
| 1997 | + | ||
| 1998 | +###### Processing PTX Block 1 | ||
| 1999 | + | ||
| 2000 | + | ||
| 2001 | +:--: | ||
| 2002 | + | ||
| 2003 | +| **Figure 11a:** Masking Off - 50k traces - **Figure 11b:** Masking On - 1M traces | | ||
| 2004 | + | ||
| 2005 | + | ||
| 2006 | +###### Interpretation | ||
| 2007 | + | ||
| 2008 | +As before (PTX Block 0), there is some first-order leakage observable when the masking is turned on. | ||
| 2009 | +For the same reasons as before, this leakage is not of concern. | ||
| 2010 | + | ||
| 2011 | +###### v) SCA Evaluation of the Tag Generation | ||
| 2012 | + | ||
| 2013 | + | ||
| 2014 | +:--: | ||
| 2015 | + | ||
| 2016 | +| **Figure 12a:** Masking Off - 50k traces - **Figure 12b:** Masking On - 1M traces | | ||
| 2017 | + | ||
| 2018 | + | ||
| 2019 | +###### Interpretation | ||
| 2020 | + | ||
| 2021 | +The generation of the final authentication tag consists of two operations. | ||
| 2022 | +1) The 128-bit block containing the AAD and ciphertext lengths is hashed and the correction terms are added. | ||
| 2023 | + The GHASH state is unmasked (still masked with the encrypted initial counter block S) and Share 1 of S is added to write the final authentication tag to the data output registers readable by software. | ||
| 2024 | +2) In parallel to writing the final authentication tag to the data output registers, the internal state is all cleared to random values and an additional multiplication is triggered to clear the internal state of the Galois-field multipliers and the correction term registers. | ||
| 2025 | + | ||
| 2026 | +If masking is turned off (Figure 12a), there is both first- and second-order leakage observable during the first activity block (tag generation) but not during the clearing operation. | ||
| 2027 | +If the masking is turned on (Figure 12b), some SCA leakage is observable between the two operations, i.e., when the final authentication tag is written to the output data registers. | ||
| 2028 | +This leakage is expected as both the fixed and the random data sets use a static AAD and plaintext. | ||
| 2029 | +This means, the tag for the fixed data set is fixed whereas the tags for the random set get randomized through the ciphertext (random due to the random key and IV). | ||
| 2030 | + | ||
| 2031 | +To summarize, the observed first-order leakage if masking is on (FIgure 12b) is not of concern. | ||
| 2032 | + | ||
| 2033 | +##### Results – FvsR PTX & AAD | ||
| 2034 | + | ||
| 2035 | +In the following, we discuss the analysis results for each FvsR PTX & AAD datasets. | ||
| 2036 | +These experiments were specifically done to investigate leakage peaks identified for the FvsR Key & IV datasets that are attributed to how the FPGA implementation tool maps flip flops and multiplexer shares to the available FPGA logic slices. | ||
| 2037 | + | ||
| 2038 | +###### i) SCA Evaluation of Generating the Hash Subkey H | ||
| 2039 | + | ||
| 2040 | + | ||
| 2041 | +:--: | ||
| 2042 | + | ||
| 2043 | +| **Figure 13a:** Masking Off - 50k traces - **Figure 13b:** Masking On - 1M traces | | ||
| 2044 | + | ||
| 2045 | + | ||
| 2046 | +###### Interpretation | ||
| 2047 | + | ||
| 2048 | +There is no SCA leakage visible in both cases without masking (Figure 13a) and with masking turned on (Figure 13b). | ||
| 2049 | +This is expected as the hash subkey generation doesn’t involve the plaintext and the AAD but only the key and IV. | ||
| 2050 | +Both the fixed and random set use the same static key and IV. | ||
| 2051 | + | ||
| 2052 | +This experiment was specifically done to check whether the leakage identified in Figure 6b and attributed to how the FPGA implementation tool maps the flip flops of the hash subkey register shares to the available FPGA logic slices. | ||
| 2053 | +As expected, the leakage peak is now gone. | ||
| 2054 | + | ||
| 2055 | +###### ii) SCA Evaluation of Encrypting the Initial Counter Block | ||
| 2056 | + | ||
| 2057 | + | ||
| 2058 | +:--: | ||
| 2059 | + | ||
| 2060 | +| **Figure 14a:** Masking Off - 50k traces - **Figure 14b:** Masking On - 1M traces | | ||
| 2061 | + | ||
| 2062 | + | ||
| 2063 | +###### Interpretation | ||
| 2064 | + | ||
| 2065 | +There is no SCA leakage visible in both cases without masking (Figure 14a) and with masking turned on (Figure 14b). | ||
| 2066 | +This is expected as the encryption of the initial counter block and the subsequent computation of repeatedly used correction terms doesn’t involve the plaintext and the AAD but only the key and IV. | ||
| 2067 | +Both the fixed and random set use the same static key and IV. | ||
| 2068 | + | ||
| 2069 | +This experiment was specifically done to check whether the leakage identified in Figure 7b and attributed to how the FPGA implementation tool maps the multiplexers in front of the GHASH state registers to the available FPGA logic slices. | ||
| 2070 | +As expected, the leakage peak is now gone. | ||
| 2071 | + | ||
| 2072 | +###### iv) SCA Evaluation of Processing the PTX Block 0 | ||
| 2073 | + | ||
| 2074 | + | ||
| 2075 | +:--: | ||
| 2076 | + | ||
| 2077 | +| **Figure 15a:** Masking Off - 100k traces - **Figure 15b:** Masking On - 1M traces | | ||
| 2078 | + | ||
| 2079 | + | ||
| 2080 | +###### Interpretation | ||
| 2081 | + | ||
| 2082 | +With the masking turned off (Figure 15a), there is first-order leakage 1) at the beginning of the operation and 2) throughout the entire GHASH operation. | ||
| 2083 | + | ||
| 2084 | +1. The leakage at the beginning of the operation is due to the input data (the plaintext) being written to an internal buffer register. | ||
| 2085 | + The AES cipher is operated in counter mode, meaning it doesn’t encrypt the input data but the counter value (incremented IV). | ||
| 2086 | + Because the IV is fixed for both the fixed and the random data set, no leakage is observed during the AES encryption even if the masking is off. | ||
| 2087 | + At the end of the AES encryption, the output of the AES cipher core is added to the content of the buffer register to produce the ciphertext which is then forwarded to the GHASH block and to the data output registers. | ||
| 2088 | +2. The GHASH operation then processes this ciphertext. | ||
| 2089 | + The observed leakage when the masking is off is expected. | ||
| 2090 | + | ||
| 2091 | +With the masking turned on (Figure 15b), the first-order leakage at the beginning of the operation remains visible. The reason for this is that the internal register buffering the previous input data is not masked. | ||
| 2092 | +This is of no concern as the leakage is not related to key or IV. | ||
| 2093 | + | ||
| 2094 | +Another first-order leakage peak is visible between the AES encryption and the GHASH operation. | ||
| 2095 | +This leakage is due to the unmasked AES cipher core output being added to the input data (coming from the internal buffer register) and the result being stored to the output data register. | ||
| 2096 | +As key and IV are static and identical for both the fixed and the random data set, the cipher core output is the same for both sets. | ||
| 2097 | +Any difference in the power signature between the two sets is due to the different plaintext / ciphertext. | ||
| 2098 | +Again, this is to be expected and of no concern as the ciphertext is not secret in the context of GCM. | ||
| 2099 | + | ||
| 2100 | +#### Reproducing the FPGA Experiments | ||
| 2101 | + | ||
| 2102 | +##### Prerequisites | ||
| 2103 | + | ||
| 2104 | +###### (i) Setting up the CW310 and CW Husky | ||
| 2105 | + | ||
| 2106 | +Please follow the guide [here](https://github.com/lowRISC/ot-sca/blob/master/doc/getting_started.md#cw310) to prepare the CW310 and CW Husky for the SCA measurements. | ||
| 2107 | + | ||
| 2108 | +###### (ii) Generating the FPGA Bitstream | ||
| 2109 | + | ||
| 2110 | +Follow the guide [here](https://opentitan.org/book/doc/getting_started/install_vivado/index.html) to install Xilinx Vivado. Please note that a valid license is needed to generate bitstreams for the CW310 FPGA board. | ||
| 2111 | + | ||
| 2112 | +Then, build the bitstream from the [aes-gcm-sca-bitstream](https://github.com/vogelpi/opentitan/tree/aes-gcm-sca-bitstream) branch. | ||
| 2113 | +This branch includes the AES-GCM and applies several optimizations (disabling certain features to reduce the area utilization) to improve the SCA measurements. | ||
| 2114 | +```sh | ||
| 2115 | +git clone https://github.com/vogelpi/opentitan.git | ||
| 2116 | +cd opentitan | ||
| 2117 | +git checkout aes-gcm-sca-bitstream | ||
| 2118 | +./bazelisk.sh build //hw/bitstream/vivado:fpga_cw310_test_rom | ||
| 2119 | +cp bazel-bin/hw/bitstream/vivado/build.fpga_cw310/synth-vivado/lowrisc_systems_chip_earlgrey_cw310_0.1.bit . | ||
| 2120 | +``` | ||
| 2121 | + | ||
| 2122 | +The resulting bitstream is `lowrisc_systems_chip_earlgrey_cw310_0.1.bit`. | ||
| 2123 | + | ||
| 2124 | +###### (iii) Compiling the Penetration Testing Binary | ||
| 2125 | + | ||
| 2126 | +The penetration testing binary that is running on the target is the framework that receives commands from the side-channel evaluation framework and triggers the AES-GCM operations. | ||
| 2127 | +```sh | ||
| 2128 | +git clone <https://github.com/vogelpi/opentitan.git> | ||
| 2129 | +cd opentitan | ||
| 2130 | +git checkout aes-gcm-review | ||
| 2131 | +./bazelisk.sh build //sw/device/tests/penetrationtests/firmware:firmware_fpga_cw310_test_rom | ||
| 2132 | +cp bazel-bin/sw/device/tests/penetrationtests/firmware/firmware_fpga_cw310_test_rom_fpga_cw310_test_rom.bin sca_ujson_fpga_cw310.bin | ||
| 2133 | +``` | ||
| 2134 | + | ||
| 2135 | +The resulting penetration testing binary is `sca_ujson_fpga_cw310.bin`. | ||
| 2136 | + | ||
| 2137 | +###### (iv) Setting up the Side-Channel Evaluation Framework | ||
| 2138 | + | ||
| 2139 | +Clone the ot-sca repository and switch to the dedicated AES-GCM branch: | ||
| 2140 | +```sh | ||
| 2141 | +git clone <https://github.com/lowRISC/ot-sca.git> | ||
| 2142 | +cd ot-sca | ||
| 2143 | +git checkout ot-sca-aes-gcm | ||
| 2144 | +``` | ||
| 2145 | + | ||
| 2146 | +Then, follow [this](https://github.com/lowRISC/ot-sca/blob/master/doc/getting_started.md#installing-on-a-machine) guideline to prepare your machine for the measurements. | ||
| 2147 | + | ||
| 2148 | +Afterwards, copy the bitstream to `ot-sca/objs/lowrisc_systems_chip_earlgrey_cw310_0.1.bit` and the binary to `ot-sca/objs/sca_ujson_fpga_cw310.bin`. | ||
| 2149 | + | ||
| 2150 | +Finally, determine the port the CW310 opened on your machine (e.g., `/dev/ttyACM2`) and set it accordingly in the `port` field of the `ot-sca/capture/configs/aes_gcm_sca_cw310.yaml` configuration file. | ||
| 2151 | + | ||
| 2152 | +##### Capturing Traces | ||
| 2153 | + | ||
| 2154 | +After fulfilling the prerequisites, traces can be captured using ot-sca. | ||
| 2155 | +To configure the measurement, adapt the script located in `ot-sca/capture/configs/aes_gcm_sca_cw310.yaml`. | ||
| 2156 | +The following parameters can be changed: | ||
| 2157 | +```yml | ||
| 2158 | +husky: | ||
| 2159 | + # Number of encryptions performed in one batch. | ||
| 2160 | + num_segments: 35 | ||
| 2161 | + # Number of cycles that are captured by the CW Husky. | ||
| 2162 | + num_cycles: 320 | ||
| 2163 | +capture: | ||
| 2164 | + # Number of traces to capture. | ||
| 2165 | + num_traces: 100000 | ||
| 2166 | + # Number of traces to keep in memory before flushing to the disk. | ||
| 2167 | + trace_threshold: 50000 | ||
| 2168 | +test: | ||
| 2169 | + # Values used for the fixed set. | ||
| 2170 | + iv_fixed: [0xDE, 0xAD, 0xBE, 0xEF, 0xCA, 0xFE, 0xBA, 0xAD, 0xF0, 0xCA, | ||
| 2171 | + 0xCC, 0x1A, 0x00, 0x00, 0x00, 0x00] | ||
| 2172 | + key_fixed: [0x81, 0x1E, 0x37, 0x31, 0xB0, 0x12, 0x0A, 0x78, 0x42, 0x78, | ||
| 2173 | + 0x1E, 0x22, 0xB2, 0x5C, 0xDD, 0xF9] | ||
| 2174 | + # Static values that are used by the fixed and the random set. | ||
| 2175 | + ptx_blocks: 2 | ||
| 2176 | + ptx_static: [[0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, | ||
| 2177 | + 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA], [0xBB, 0xBB, 0xBB, | ||
| 2178 | + 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, | ||
| 2179 | + 0xBB, 0xBB, 0xBB]] | ||
| 2180 | + ptx_last_block_len_bytes: 16 | ||
| 2181 | + aad_blocks: 2 | ||
| 2182 | + aad_static: [[0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, | ||
| 2183 | + 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC], [0xDD, 0xDD, 0xDD, | ||
| 2184 | + 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, | ||
| 2185 | + 0xDD, 0xDD, 0xDD, 0xDD]] | ||
| 2186 | + aad_last_block_len_bytes: 16 | ||
| 2187 | + # Trigger configuration (select only one). | ||
| 2188 | + # [Hash sub key, Init. block, AAD block, PTX block, TAG block] | ||
| 2189 | + triggers: [False, False, False, False, True] | ||
| 2190 | + # Which AAD or PTX block. 0 = first block. | ||
| 2191 | + trigger_block: 0 | ||
| 2192 | + # 32-bit seed for masking on device. To switch off the masking, use 0 | ||
| 2193 | + # as an LFSR seed. | ||
| 2194 | + lfsr_seed: 0x00000000 | ||
| 2195 | + #lfsr_seed: 0xdeadbeef | ||
| 2196 | +``` | ||
| 2197 | + | ||
| 2198 | +After tweaking the configuration, the traces can be captured by executing: | ||
| 2199 | + | ||
| 2200 | +```sh | ||
| 2201 | +cd capture | ||
| 2202 | +./capture_aes_gcm.py -c configs/aes_gcm_sca_cw310.yaml -p aes_gcm_sca | ||
| 2203 | +``` | ||
| 2204 | + | ||
| 2205 | +Where the `-c` parameter is the config and `-p` the database where the traces are stored. | ||
| 2206 | + | ||
| 2207 | +##### Performing the TVLA | ||
| 2208 | + | ||
| 2209 | +After capturing the traces, the TVLA can be performed by switching into the `ot-sca/analysis` folder, copying the `ot-sca/analysis/configs/tvla_cfg_kmac.yaml` file to `ot-sca/analysis/configs/tvla_cfg_aes_gcm.yaml`, and modifying the configuration file: | ||
| 2210 | +```yml | ||
| 2211 | +project_file: ../capture/projects/aes_gcm_sca | ||
| 2212 | +trace_file: null | ||
| 2213 | +trace_start: null | ||
| 2214 | +trace_end: null | ||
| 2215 | +leakage_file: null | ||
| 2216 | +save_to_disk: null | ||
| 2217 | +save_to_disk_ttest: null | ||
| 2218 | +round_select: null | ||
| 2219 | +byte_select: null | ||
| 2220 | +input_histogram_file: null | ||
| 2221 | +output_histogram_file: null | ||
| 2222 | +number_of_steps: 1 | ||
| 2223 | +ttest_step_file: null | ||
| 2224 | +plot_figures: true | ||
| 2225 | +test_type: "GENERAL_KEY" | ||
| 2226 | +mode: aes | ||
| 2227 | +filter_traces: true | ||
| 2228 | +trace_threshold: 50000 | ||
| 2229 | +trace_db: ot_trace_library | ||
| 2230 | +``` | ||
| 2231 | + | ||
| 2232 | +By calling | ||
| 2233 | +```sh | ||
| 2234 | +./tvla.py --cfg-file tvla_cfg_aes_gcm.yaml run-tvla | ||
| 2235 | +``` | ||
| 2236 | +the TVLA plot is generated. | ||
| 2237 | + | ||
| 1459 | 2238 | ## PCR vault | |
| 1460 | 2239 | ||
| 1461 | 2240 | * Platform Configuration Register (PCR) vault is a register file that stores measurements to be used by the microcontroller. | |
| 1462 | -* PCR entries are read-only registers of 384 bits each. | ||
| 2241 | +* PCR entries are read-only registers of 512 bits each. | ||
| 1463 | 2242 | * Control bits allow for entries to be cleared by FW, which sets their values back to 0. | |
| 1464 | 2243 | * A lock bit can be set by FW to prevent the entry from being cleared. The lock bit is sticky and only resets on a powergood cycle. | |
| 1465 | 2244 | ||
| @@ -1490,23 +2269,23 @@ | |||
| 1490 | 2269 | ||
| 1491 | 2270 | ## Key vault | |
| 1492 | 2271 | ||
| 1493 | -Key Vault (KV) is a register file that stores the keys to be used by the microcontroller, but this register file is not observed by the microcontroller. Each cryptographic function has a control register and functional block designed to read from and write to the KV. | ||
| 2272 | +Key Vault (KV) is a register file that stores the keys to be used by the microcontroller, but this register file is not observed by the microcontroller. Each cryptographic function has a control register and functional block designed to read from and write to the KV. | ||
| 1494 | 2273 | ||
| 1495 | 2274 | | KV register | Description | | |
| 1496 | 2275 | | :-------------------------------- | :-------------------------------------------------------- | | |
| 1497 | -| Key Control\[31:0\] | 32 Control registers, 32 bits each | | ||
| 1498 | -| Key Entry\[31:0\]\[11:0\]\[31:0\] | 32 Key entries, 384 bits each <br>No read or write access | | ||
| 2276 | +| Key Control\[23:0\] | 24 Control registers, 32 bits each | | ||
| 2277 | +| Key Entry\[23:0\]\[15:0\]\[31:0\] | 24 Key entries, 512 bits each <br>No read or write access | | ||
| 1499 | 2278 | ||
| 1500 | 2279 | ||
| 1501 | 2280 | ### Key vault functional block | |
| 1502 | 2281 | ||
| 1503 | -Keys and measurements are stored in 512b register files. These have no read or write path from the microcontroller. The entries are read through a passive read mux driven by each cryptographic block. Locked entries return zeroes. | ||
| 1504 | - | ||
| 1505 | -Entries in the KV must be cleared via control register, or by de-assertion of pwrgood. | ||
| 1506 | - | ||
| 1507 | -Each entry has a control register that is writable by the microcontroller. | ||
| 1508 | - | ||
| 1509 | -The destination valid field is programmed by FW in the cryptographic block generating the key, and it is passed here at generation time. This field cannot be modified after the key is generated and stored in the KV. | ||
| 2282 | +Keys and measurements are stored in 512b register files. These have no read or write path from the microcontroller. The entries are read through a passive read mux driven by each cryptographic block. Locked entries return zeroes. | ||
| 2283 | + | ||
| 2284 | +Entries in the KV must be cleared via control register, or by de-assertion of pwrgood. | ||
| 2285 | + | ||
| 2286 | +Each entry has a control register that is writable by the microcontroller. | ||
| 2287 | + | ||
| 2288 | +The destination valid field is programmed by FW in the cryptographic block generating the key, and it is passed here at generation time. This field cannot be modified after the key is generated and stored in the KV. | ||
| 1510 | 2289 | ||
| 1511 | 2290 | | KV Entry Ctrl Fields | Reset | Description | | |
| 1512 | 2291 | | --------------------------- | ------------------- | ------------------------ | | |
| @@ -1515,11 +2294,11 @@ | |||
| 1515 | 2294 | | Clear\[2\] | cptra_rst_b | If unlocked, setting the clear bit causes KV to clear the associated entry. The clear bit is reset after entry is cleared. | | |
| 1516 | 2295 | | Copy\[3\] | cptra_rst_b | ENHANCEMENT: Setting the copy bit causes KV to copy the key to the entry written to Copy Dest field. | | |
| 1517 | 2296 | | Copy Dest\[8:4\] | cptra_rst_b | ENHANCEMENT: Destination entry for the copy function. | | |
| 1518 | -| Dest_valid\[16:9\] | hard_reset_b | KV entry can be used with the associated cryptographic block if the appropriate index is set. <br>\[0\] - HMAC KEY <br>\[1\] - HMAC BLOCK <br>\[2\] - SHA BLOCK <br>\[2\] - ECC PRIVKEY <br>\[3\] - ECC SEED <br>\[7:5\] - RSVD | | ||
| 2297 | +| Dest_valid\[16:9\] | hard_reset_b | KV entry can be used with the associated cryptographic block if the appropriate index is set. <br>\[0\] - HMAC KEY <br>\[1\] - HMAC BLOCK <br>\[2\] - MLDSA SEED <br>\[3\] - ECC PRIVKEY <br>\[4\] - ECC SEED <br>\[5\] - AES KEY <br>\[7:6\] - RSVD | | ||
| 1519 | 2298 | | last_dword\[20:19\] | hard_reset_b | Store the offset of the last valid dword, used to indicate the last cycle for read operations. | | |
| 1520 | 2299 | ||
| 1521 | 2300 | ||
| 1522 | -### Key vault cryptographic functional block | ||
| 2301 | +### Key vault cryptographic functional block | ||
| 1523 | 2302 | ||
| 1524 | 2303 | A generic block is instantiated in each cryptographic block to enable access to KV. | |
| 1525 | 2304 | ||
| @@ -1551,10 +2330,11 @@ | |||
| 1551 | 2330 | | write_entry\[5:1\] | Key vault entry to store the result. | | |
| 1552 | 2331 | | hmac_key_dest_valid\[6\] | HMAC KEY is a valid destination. | | |
| 1553 | 2332 | | hmac_block_dest_valid\[7\] | HMAC BLOCK is a valid destination. | | |
| 1554 | -| sha_block_dest_valid\[8\] | SHA BLOCK is a valid destination. | | ||
| 2333 | +| mldsa_seed_dest_valid\[8\] | MLDSA SEED is a valid destination. | | ||
| 1555 | 2334 | | ecc_pkey_dest_valid\[9\] | ECC PKEY is a valid destination. | | |
| 1556 | 2335 | | ecc_seed_dest_valid\[10\] | ECC SEED is a valid destination. | | |
| 1557 | -| rsvd\[31:11\] | Reserved field | | ||
| 2336 | +| aes_key_dest_valid\[11\] | AES KEY is a valid destination. | | ||
| 2337 | +| rsvd\[31:12\] | Reserved field | | ||
| 1558 | 2338 | ||
| 1559 | 2339 | ||
| 1560 | 2340 | | KV Status Reg | Description | | |
| @@ -1583,12 +2363,12 @@ | |||
| 1583 | 2363 | ||
| 1584 | 2364 | ### Key vault de-obfuscation block operation | |
| 1585 | 2365 | ||
| 1586 | -A de-obfuscation engine (DOE) is used in conjunction with AES cryptography to de-obfuscate the UDS and field entropy. | ||
| 1587 | - | ||
| 1588 | -1. The obfuscation key is driven to the AES key. The data to be decrypted (either obfuscated UDS or obfuscated field entropy) is fed into the AES data. | ||
| 1589 | -2. An FSM manually drives the AES engine and writes the decrypted data back to the key vault. | ||
| 1590 | -3. FW programs the DOE with the requested function (UDS or field entropy de-obfuscation), and the destination for the result. | ||
| 1591 | -4. After de-obfuscation is complete, FW can clear out the UDS and field entropy values from any flops until cptra\_pwrgood de-assertion. | ||
| 2366 | +A de-obfuscation engine (DOE) is used in conjunction with AES cryptography to de-obfuscate the UDS and field entropy. | ||
| 2367 | + | ||
| 2368 | +1. The obfuscation key is driven to the AES key. The data to be decrypted (either obfuscated UDS or obfuscated field entropy) is fed into the AES data. | ||
| 2369 | +2. An FSM manually drives the AES engine and writes the decrypted data back to the key vault. | ||
| 2370 | +3. FW programs the DOE with the requested function (UDS or field entropy de-obfuscation), and the destination for the result. | ||
| 2371 | +4. After de-obfuscation is complete, FW can clear out the UDS and field entropy values from any flops until cptra\_pwrgood de-assertion. | ||
| 1592 | 2372 | ||
| 1593 | 2373 | The following tables describe DOE register and control fields. | |
| 1594 | 2374 | ||
| @@ -1605,13 +2385,13 @@ | |||
| 1605 | 2385 | | DEST\[4:2\] | Cptra_rst_b | Destination register for the result of the de-obfuscation flow. Field entropy writes into DEST and DEST+1 <br>Key entry only, can’t go to PCR . | | |
| 1606 | 2386 | ||
| 1607 | 2387 | ||
| 1608 | -### Key vault de-obfuscation flow | ||
| 1609 | - | ||
| 1610 | -1. ROM loads IV into DOE. ROM writes to the DOE control register the destination for the de-obfuscated result and sets the appropriate bit to run UDS and/or the field entropy flow. | ||
| 1611 | -2. DOE state machine takes over and loads the Caliptra obfuscation key into the key register. | ||
| 1612 | -3. Next, either the obfuscated UDS or field entropy are loaded into the block register 4 DWORDS at a time. | ||
| 1613 | -4. Results are written to the KV entry specified in the DEST field of the DOE control register. | ||
| 1614 | -5. State machine resets the appropriate RUN bit when the de-obfuscated key is written to KV. FW can poll this register to know when the flow is complete. | ||
| 2388 | +### Key vault de-obfuscation flow | ||
| 2389 | + | ||
| 2390 | +1. ROM loads IV into DOE. ROM writes to the DOE control register the destination for the de-obfuscated result and sets the appropriate bit to run UDS and/or the field entropy flow. | ||
| 2391 | +2. DOE state machine takes over and loads the Caliptra obfuscation key into the key register. | ||
| 2392 | +3. Next, either the obfuscated UDS or field entropy are loaded into the block register 4 DWORDS at a time. | ||
| 2393 | +4. Results are written to the KV entry specified in the DEST field of the DOE control register. | ||
| 2394 | +5. State machine resets the appropriate RUN bit when the de-obfuscated key is written to KV. FW can poll this register to know when the flow is complete. | ||
| 1615 | 2395 | 6. The clear obf secrets command flushes the obfuscation key, the obfuscated UDS, and the field entropy from the internal flops. This should be done by ROM after both de-obfuscation flows are complete. | |
| 1616 | 2396 | ||
| 1617 | 2397 | ## Data vault | |
| @@ -1626,7 +2406,7 @@ | |||
| 1626 | 2406 | ||
| 1627 | 2407 | ## Cryptographic blocks fatal and non-fatal errors | |
| 1628 | 2408 | ||
| 1629 | -The following table describes cryptographic errors. | ||
| 2409 | +The following table describes cryptographic errors. | ||
| 1630 | 2410 | ||
| 1631 | 2411 | | Errors | Error type | Description | | |
| 1632 | 2412 | | :----------- | :----------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------- | | |
| @@ -1654,6 +2434,7 @@ | |||
| 1654 | 2434 | | DRBG | Deterministic Random Bit Generator | | |
| 1655 | 2435 | | DWORD | 32-bit (4-byte) data element | | |
| 1656 | 2436 | | ECDSA | Elliptic Curve Digital Signature Algorithm | | |
| 2437 | +| ECDH | Elliptic Curve Deffie-Hellman Key Exchange | | ||
| 1657 | 2438 | | FMC | FW First Mutable Code | | |
| 1658 | 2439 | | FSM | Finite State Machine | | |
| 1659 | 2440 | | GPU | Graphics Processing Unit | | |
| @@ -1693,20 +2474,21 @@ | |||
| 1693 | 2474 | ||
| 1694 | 2475 | # References | |
| 1695 | 2476 | ||
| 1696 | -1. J. Strömbergson, "Secworks," \[Online\]. Available at https://github.com/secworks. | ||
| 1697 | -2. NIST, Federal Information Processing Standards Publication (FIPS PUB) 180-4 Secure Hash Standard (SHS). | ||
| 1698 | -3. OpenSSL \[Online\]. Available at https://www.openssl.org/docs/man3.0/man3/SHA512.html. | ||
| 1699 | -4. N. W. Group, RFC 3394, Advanced Encryption Standard (AES) Key Wrap Algorithm, 2002. | ||
| 1700 | -5. NIST, Federal Information Processing Standards Publication (FIPS) 198-1, The Keyed-Hash Message Authentication Code, 2008. | ||
| 1701 | -6. N. W. Group, RFC 4868, Using HMAC-SHA256, HMAC-SHA384, and HMAC-SHA512 with IPsec, 2007. | ||
| 1702 | -7. RFC 6979, Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA), 2013. | ||
| 1703 | -8. TCG, Hardware Requirements for a Device Identifier Composition Engine, 2018. | ||
| 1704 | -9. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302. | ||
| 1705 | -10. Schindler, W., Wiemers, A.: Efficient side-channel attacks on scalar blinding on elliptic curves with special structure. In: NISTWorkshop on ECC Standards (2015). | ||
| 1706 | -11. National Institute of Standards and Technology, "Digital Signature Standard (DSS)", Federal Information Processing Standards Publication (FIPS PUB) 186-4, July 2013. | ||
| 2477 | +1. J. Strömbergson, "Secworks," \[Online\]. Available at https://github.com/secworks. | ||
| 2478 | +2. NIST, Federal Information Processing Standards Publication (FIPS PUB) 180-4 Secure Hash Standard (SHS). | ||
| 2479 | +3. OpenSSL \[Online\]. Available at https://www.openssl.org/docs/man3.0/man3/SHA512.html. | ||
| 2480 | +4. N. W. Group, RFC 3394, Advanced Encryption Standard (AES) Key Wrap Algorithm, 2002. | ||
| 2481 | +5. NIST, Federal Information Processing Standards Publication (FIPS) 198-1, The Keyed-Hash Message Authentication Code, 2008. | ||
| 2482 | +6. N. W. Group, RFC 4868, Using HMAC-SHA256, HMAC-SHA384, and HMAC-SHA512 with IPsec, 2007. | ||
| 2483 | +7. RFC 6979, Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA), 2013. | ||
| 2484 | +8. TCG, Hardware Requirements for a Device Identifier Composition Engine, 2018. | ||
| 2485 | +9. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302. | ||
| 2486 | +10. Schindler, W., Wiemers, A.: Efficient side-channel attacks on scalar blinding on elliptic curves with special structure. In: NISTWorkshop on ECC Standards (2015). | ||
| 2487 | +11. National Institute of Standards and Technology, "Digital Signature Standard (DSS)", Federal Information Processing Standards Publication (FIPS PUB) 186-4, July 2013. | ||
| 1707 | 2488 | 12. NIST SP 800-90A, Rev 1: "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", 2012. | | |
| 1708 | -13. CHIPS Alliance, “RISC-V VeeR EL2 Programmer’s Reference Manual” \[Online\] Available at https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf. | ||
| 1709 | -14. “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanovi ́c, RISC-V Foundation, December 2019. Available at https://riscv.org/technical/specifications/. | ||
| 1710 | -15. “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanovi ́c, and John Hauser, RISC-V International, December 2021. Available at https://riscv.org/technical/specifications/. | ||
| 2489 | +13. CHIPS Alliance, “RISC-V VeeR EL2 Programmer’s Reference Manual” \[Online\] Available at https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf. | ||
| 2490 | +14. “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanovi ́c, RISC-V Foundation, December 2019. Available at https://riscv.org/technical/specifications/. | ||
| 2491 | +15. “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanovi ́c, and John Hauser, RISC-V International, December 2021. Available at https://riscv.org/technical/specifications/. | ||
| 2492 | +16. NIST SP 800-56A, Rev 3: "Recommendation for Pair-Wise Key-Establishment Schemes Using Discrete Logarithm Cryptography", 2018, | | ||
| 1711 | 2493 | ||
| 1712 | 2494 | <sup>[1]</sup> _Caliptra.** **Spanish for “root cap” and describes the deepest part of the root_ | |
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version
Image not present in this version