Changes

Renaud Pacalet · e781a330
--- a/trescca-demo.md
+++ b/trescca-demo.md
+# TRESCCA final demonstration
+
+## Hardware and software requirements
+
+* A host PC with the software drivers for serial communication with the ZedBoard (Cypress `CY7C64225` USB-to-UART bridge) and a serial console (`minicom`, `cu`, `screen`...).
+* An attacker PC with the software drivers for the FTDI `FT232R` USB-to-UART bridge and the `libftdi` library.
+* A SD card (formatted to be compatible with the Zynq boot ROM code).
+* A USB-UART Pmod module from Digilent.
+* Two USB-A - micro USB cables.
+
+Note: a single PC can play both roles but ideally, two host PCs should be used in order to clearly differentiate between the host system under attack and the attacker.
+
+The ZedBoard runs a Linux / BusyBox software stack, including a mock-up player software application that displays the content of 4 memory pages on which the HSM enforces 4 different security policies.
+
+The attacker PC runs an attacking software application that reads and writes in the player's memory pages.
+
+The security policy applied to the memory area available to the Linux / BusyBox software stack is configurable through the 8 slide switches of the ZedBoard. The software stack includes several performance benchmarks (`Dhrystone`, `Whetstone`, `ramspeed`, `RAMsmp`) that can be used to estimate the performance impact of the HSM in various configurations. The Linux boot sequence is timed, allowing to observe the HSM impact using the boot messages (`dmesg`).
+
+## Hardware description
+
+The provided hardware design runs at 50MHz and embeds:
+
+* The HSM
+* A Xilinx AXI UART module
+* A UART2MAXILITE instance, connected to the Xilinx AXI UART module on one end and to the HP1_AXI PL-to-PS port on the other end
+* An AXI monitor that records the memory accesses by the CPU in a memory map
+* A debugging module that drives the 8 LEDs with one of four possible sources (switches, HSM, UART2MAXILITE, AXI monitor)
+
+The centre press button is a soft reset that resets some internal registers and the internal memories of the AXI monitor. Use with care as it is not always possible to recover from a soft reset without a hard reset.
+
+The north press button is used to select the debugging information sent to the LEDs.
+
+The configuration of the 8 slide switches at boot time is used to select the HSM configuration. At run time it has an influence on the debugging information produced by some hardware modules (see below).
+
+The CPU accesses the external DDR only through the PL, in the `[0X4000_0000, 0X8000_0000[` memory range (Alternate Address Space - AAS). The HSM protects all memory accesses according the static configuration applied by the First Stage Boot Loader (FSBL), as explained in the next section.
+
+The Xilinx AXI UART module and the UART2MAXILITE instance allow the attacking software running on the attacker PC to access the content of the complete external DDR either for reading or writing. This is used to demonstrate sniffing and injection attacks and to show how the HSM can prevent or detect them. From the attacking software perspective the 512 MB external DDR is located in the `[0x0000_0000, 0x2000_0000[` range.
+
+The AXI monitor traces all memory accesses performed by the CPU in the AAS and records them in two 1k x 32 bits block RAMs, one for read accesses and the other for the write accesses. This is mainly used for debugging, to understand which memory region is accessed by which software component. At reset time the two block RAMs are initialized with zeroes. Then, the monitor observes bits 29...15 of the addresses issued by the CPU and sets the corresponding bit in one or the other. The content of the two RAMs can be read out from the attacker PC in the `[0x2000_0000, 0x2000_2000[` range.
+
+The debugging module sends one of four 8-bits sources, numbered 0 to 3, to the 8 LEDs:
+
+| Index | Source                                |
+| :---  | :---                                  |
+| 0     | the slide switches                    |
+| 1     | the GPO output from the HSM           |
+| 2     | the GPO output from the UART2MAXILITE |
+| 3     | the GPO output from the AXI monitor   |
+
+After reset, source 0 is selected. Pressing the NORTH press button cycles between the four sources. The new source index (0, 1, 2 or 3) is displayed on the LEDs when the button is pressed and the new source is displayed when the button is released.
+
+The GPO output from the HSM (source number 1) depends on the slide switches:
+
+| SWITCHES      | GPO_HSM                         |
+| :---          | :---                            |
+| `0x00`        | `SBGPO`                         |
+| `VAL != 0x00` | `AW\|W\|AR\|R\|B\|0\|IRQ_C_R\|IRQ_E_R` |
+
+where:
+
+* `AW`, `W,` `AR`, `R` and `B` are 5 one-bit GP0_AXI activity indicators,
+* `IRQ_C_R` is the registered interrupt request issued by the HSM to indicate the completion of an atomic command,
+* `IRQ_E_R` is the registered interrupt request issued by the HSM to signal an integrity violation and
+* `SBGPO` is an 8-bits write-only register mapped at address `0x78` (relative to the base address of the HSM interface registers).
+
+The GP0_AXI activity indicators are computed from five 8-bits counters counting the completed AXI transactions on each of the 5 AXI channels: address write (`AW`), data write (`W`), address read (`AR`), read response (`R`) and write response (`B`). The counters are bitwise anded with `VAL`, the current state of the slide switches, and the result is or-reduced to produce the corresponding GP0_AXI activity indicator.
+
+The GPO output from the UART2MAXILITE also depends on the slide switches. If the value indicated by the switches is the index of one of its 9 internal register, GPO carries the value of this register. Else GPO is set to:
+
+    00|ARESETN|SRST|RESETN_LOCAL|BEAT|RRX|RTX
+
+where `ARESETN`, `SRST` and `RESETN_LOCAL` are the system, soft and local reset, respectively, `BEAT` is a life indicator that blinks at $`1/2^24`$ of the PL clock frequency, `RRX` and `RTX` are the registered UART RX and TX lines, respectively.
+
+The GPO output from the AXI monitor is constant `0xff`.
+
+## HSM configuration
+
+The HSM is configured by the First Stage Boot Loader (FSBL). The FSBL reads the status of the 8 slide switches, considered as an 8 bits integer (SW0 being the least significant) and, based on this, selects one configuration:
+
+| Slide switches | Code name |
+| :---           | :---      |
+| 0              | G4K_NN    |
+| 1              | G4K_ON    |
+| 4              | G4K_CN    |
+| 5              | G4K_NT    |
+| 6              | G4K_CT    |
+| 7              | G64K_NN   |
+| 8              | G64K_ON   |
+| 11             | G64K_CN   |
+| 12             | G64K_NT   |
+| 13             | G64K_CT   |
+| 14             | G1M_NN    |
+| 15             | G1M_ON    |
+| 18             | G1M_CN    |
+| 19             | G1M_NT    |
+| 20             | G1M_CT    |
+| 21             | G16M_NN   |
+| 22             | G16M_ON   |
+| 25             | G16M_CN   |
+| 26             | G16M_NT   |
+| 27             | G16M_CT   |
+| 28             | HD        |
+
+In the last configuration, denoted `HD` (28) for `HSM_DISABLED`, the HSM is disabled.
+
+In the description of the other configurations the confidentiality modes are denoted `N`, `O` or `C` for `NONE`, `OTP` or `CBC` and the integrity modes are denoted `N`, `M` or `T` for `NONE`, `MAC` or `MACT`. These configurations are denoted by a `Gss_ci` code name where:
+
+* `ss` in `{4K, 64K, 1M, 16M}` is a page granularity
+* `c` in `{N, O, C}` is the confidentiality mode applied to the memory area usable by U-Boot and Linux
+* `i` in `{N, C, T}` is the integrity mode applied to the memory area usable by U-Boot and Linux
+
+Not all $`4*3*3=36`$ theoretical configurations make sense: the 8 `Gss_OT` and`Gss_CM` configurations assume that the memory region is read-only for confidentiality protection and that it is read-write for integrity protection or the opposite. They are discarded. The 8 `Gss_cM` configurations are also discarded because the memory area usable by U-Boot and Linux is read-write and cannot be integrity-protected with MAC sets. Using them usually freezes the system just after the FSBL jumps into U-Boot.
+
+Example: `G64K_CT` is the configuration where the protection is applied by pages of 64kB (when possible) and where the memory area usable by U-Boot and Linux is protected in CBC/MACT.
+
+In all configurations but HD:
+
+* FSBL loads U-Boot with`load=0x5800_0000,startup=0x5800_0000`
+* U-Boot is patched (including its device tree) to use only `[0x5000_0000, 0x6000_0000[`
+* U-Boot loads the device tree of the Linux kernel at `0x52a0_0000`
+* U-Boot loads the root filesystem at `0x5400_0000`
+* U-Boot loads the Linux kernel at `0x5300_0000`
+* The load address of the `uImage` is `0x5000_8000`
+* The device tree of the Linux kernel is patched to use only `[0x5000_0000, 0x6000_0000[`
+
+The memory layout is (in Regular Address Space - RAS - addresses):
+
+| Range                        | Description                                 |
+| :---                         | :---                                        |
+| `[0x0000_0000, 0x0100_0000[` | unused                                      |
+| `[0x0100_0000, 0x0110_0000[` | HSM master block                            |
+| `[0x0110_0000, 0x0800_0000[` | unused                                      |
+| `[0x0800_0000, 0xzzzz_zzzz[` | HSM pages of MAC set and trees              |
+| `[0xzzzz_zzzz, 0x0fff_c000[` | unused                                      |
+| `[0x0fff_c000, 0x1000_0000[` | pages of the player application ($`4*4`$kB) |
+| `[0x1000_0000, 0x2000_0000[` | available memory for U-Boot and Linux       |
+
+where `0xzzzz_zzzz` depends on the configuration but is always strictly less than `0x0fff_c000`. TODO: prove this and also prove that the allocator of MAC sets and trees works.
+
+The applied protections are:
+
+| Range                        | Protection                  |
+| :---                         | :---                        |
+| `[0x0000_0000, 0x0fff_c000[` | `NN`                        |
+| `[0x0fff_c000, 0x0fff_d000[` | `NN`                        |
+| `[0x0fff_d000, 0x0fff_e000[` | `CN`                        |
+| `[0x0fff_e000, 0x0fff_f000[` | `NT`                        |
+| `[0x0fff_f000, 0x1000_0000[` | `CT`                        |
+| `[0x1000_0000, 0x2000_0000[` | `ci` (the `ci` of `Gss_ci`) |
+
+The only thing that changes from one configuration to the other (but `HD`) is the pages granularity and the protection applied to `[0x1000_0000, 0x2000_0000[`. The chosen page granularity is always the largest possible but less or equal `ss`. Memory areas are sub-divided when necessary. Example, in configuration `G1M_CT` the areas and their protections are:
+
+| Range                        | Protection |
+| :---                         | :---       |
+| `[0x0000_0000, 0x0ff0_0000[` | `G1M_NN`   |
+| `[0x0ff0_0000, 0x0fff_0000[` | `G64K_NN`  |
+| `[0x0fff_0000, 0x0fff_c000[` | `G4K_NN`   |
+| `[0x0fff_c000, 0x0fff_d000[` | `G4K_NN`   |
+| `[0x0fff_d000, 0x0fff_e000[` | `G4K_CN`   |
+| `[0x0fff_e000, 0x0fff_f000[` | `G4K_NT`   |
+| `[0x0fff_f000, 0x1000_0000[` | `G4K_CT`   |
+| `[0x1000_0000, 0x2000_0000[` | `G1M_CT`   |
+
+and in `G16M_ON`:
+
+| Range                        | Protection |
+| :---                         | :---       |
+| `[0x0000_0000, 0x0f00_0000[` | `G16M_NN`  |
+| `[0x0f00_0000, 0x0ff0_0000[` | `G1M_NN`   |
+| `[0x0ff0_0000, 0x0fff_0000[` | `G64K_NN`  |
+| `[0x0fff_0000, 0x0fff_c000[` | `G4K_NN`   |
+| `[0x0fff_c000, 0x0fff_d000[` | `G4K_NN`   |
+| `[0x0fff_d000, 0x0fff_e000[` | `G4K_CN`   |
+| `[0x0fff_e000, 0x0fff_f000[` | `G4K_NT`   |
+| `[0x0fff_f000, 0x1000_0000[` | `G4K_CT`   |
+| `[0x1000_0000, 0x2000_0000[` | `G16M_ON`  |
+
+Only the Page Security Parameter Entries (PSPEs) corresponding to the largest page size are initialized. The PSPEs that are under the scope of a valid PSPE of a larger size are uninitialized. TODO: check that it actually works.
+
+For each integrity-protected page (`M` or `T`), the FSBL relies on a MAC set and MAC tree allocator (see `fsbl/new/msta.{h,c}`) to select a page of MAC sets or MAC trees and an index of set or tree in the page. In the worst case (`Gxx_xT`) 256MB$`+2*4`$kB must be protected with MAC trees (256MB of Linux kernel available memory plus two 4kB pages for the player application). This leads to 21846 4kB pages of MAC trees or 87384 kB. Accounting the fragmentation introduced by the allocator this should fit in `[0x0800_0000, 0x0fff_c000[` (TODO: check this).
+
+## Download
+
+Download the archive on the attacker PC:
+
+```bash
+eve@attacker> mkdir trescca-demo
+eve@attacker> cd trescca-demo
+eve@attacker> wget --no-check-certificate https://gitlab.telecom-paristech.fr/renaud.pacalet/secbus-private/wikis/downloads/sdcard-trescca.tgz
+```
+
+## Prepare SD card
+
+Insert a SD card in the SD card reader of the attacker PC and unpack the archive:
+
+```bash
+eve@attacker> tar --directory=/media/SDCARD -xf sdcard-trescca.tgz
+```
+
+## Attacker applications
+
+Copy the attacker applications on the attacker PC and compile it:
+
+```bash
+eve@attacker> cp -r /media/SDCARD/C .
+eve@attacker> cd C
+eve@attacker> make u2m_m2f u2m_f2m
+```
+
+If necessary, first install `libftdi`:
+
+```bash
+eve@attacker> sudo apt-get install libftdi-dev
+```
+
+## Boot ZedBoard
+
+Eject the SD card from the attacker PC, insert it in the ZedBoard's SD card slot, connect the first USB A-micro USB cable to the USB-UART connector of the ZedBoard on one end and to the host PC on the other end. Power up the ZedBoard and launch a serial console on the host PC (`minicom`, `cu`, `screen`...)
+
+Connect as user `root` (password `secbus`) and launch the player (use `secbus` as decryption password):
+
+```bash
+root@secbus> player
+password:
+
+Play from:
+1. Unprotected page
+2. Encrypted page
+3. Integrity-checked page
+4. Encrypted and integrity-checked page
+
+0. Quit
+ >>
+```
+
+The player reads an encrypted data file (`/mnt/data/data.bin`), asks for the decryption password, decrypts the file and stores its content at the beginning of four 4-kB memory pages with various security policies:
+
+| Page index | Protection                                                             | Address range                 |
+| :---       | :---                                                                   | :---                          |
+| 1          | unprotected                                                            | `[0x4fff_c000...0x4fff_d000[` |
+| 2          | encrypted in CBC mode                                                  | `[0x4fff_d000...0x4fff_e000[` |
+| 3          | checked for integrity with trees of CBC-MACs                           | `[0x4fff_e000...0x4fff_f000[` |
+| 4          | encrypted in CBC mode and checked for integrity with trees of CBC-MACs | `[0x4fff_f000...0x5000_0000[` |
+
+When presented the menu, type 1, 2, 3 or 4 (no newline) to show that the 4 pages are all equivalent and that their protection is transparent. At each keystroke, the corresponding page is read from the (un-cached) memory and printed to the console.
+
+## Attack
+
+Plug the USB-UART Pmod module from Digilent to the top 6-pins row of the JA Pmod connector of the ZedBoard (closest to the 8 switches and LEDs). Connect the attacker PC to the USB-UART Pmod module using the second USB A-micro USB cable.
+
+From the attacker PC, retrieve the content of the 4 memory pages:
+
+```bash
+eve@attacker> ./u2m_m2f 0x0fffc000 672 page1.dat
+eve@attacker> ./u2m_m2f 0x0fffd000 672 page2.dat
+eve@attacker> ./u2m_m2f 0x0fffe000 672 page3.dat
+eve@attacker> ./u2m_m2f 0x0ffff000 672 page4.dat
+```
+
+Note that the pages' base addresses, from the host's and from the attacker's perspectives, differ by 1GB. This is due to he fact that the processor in the ZedBoard accesses the memory through the AXI general purpose master port `M_AXI_GP0` of the Processing System (mapped to the `[0x4000_0000...0x8000_0000[` address range), while the attacker is directly connected to the AXI high performance slave port `S_AXI_HP1` of the Processing System (mapped to the `[0x0000_0000...0x4000_0000[` address range).
+
+Show that the 2 pages in clear form (1 and 3) disclose their content while the two encrypted ones (2 and 4) don't.
+
+From the attacker PC, modify the content of the 4 files and send them back to memory:
+
+```bash
+eve@attacker> vim page*.dat
+eve@attacker> ./u2m_f2m 0x0fffc000 page1.dat
+eve@attacker> ./u2m_f2m 0x0fffd000 page2.dat
+eve@attacker> ./u2m_f2m 0x0fffe000 page3.dat
+eve@attacker> ./u2m_f2m 0x0ffff000 page4.dat
+```
+
+Before reading the content of the attacked memory pages on the ZedBoard, set the switches in a configuration other than 0x00 and use the centre press button to send the HSM debug output (source number 1) to the LEDs such that the registered and sticky interrupts of the HSM are routed to the two rightmost LEDs.
+
+From the ZedBoard play the 2 pages which integrity is not protected and show that the modification is undetected.
+
+From the ZedBoard play one of the 2 pages which integrity is protected and show that the modification is detected. The LED connected to the HSM interrupt request line illuminates, the HSM stops forwarding the memory access requests and the system freezes.
+
+## Performance evaluation
+
+First select one of the 29 available configurations (avoid the 8 `Gss_cM`), set the slide switches to reflect the index of the selected configuration (switch `SW0` being the LSB and `SW7` being the MSB) and reboot the board. Be patient because in some configurations the boot time can be up to one or two minutes. Connect as root and type:
+
+```bash
+perf.sh
+```
+
+This will launch a series of benchmarks and store their results in a directory named `/root/perf/Gss_ci` (where `ss`, `c` and `i` correspond to the selected configuration). There also, be patient: depending on the configuration the benchmark can take up to 3 or 4 hours! After all benchmarks have been run, if the SD card has a second partition (`ext3` is the recommended partition format because it is the one that seems to be the most stable) it will then be mounted on `/mnt` and the `/root/perf/Gss_ci` directory will be copied on it. Else, find another way to save the results: the root file system is an `initramfs`, its content is not preserved from on boot to the next.
+
+Each `Gss_ci` benchmark result directory contains:
+
+| File name                | Output of            |
+| :---                     | :---                 |
+| `dmesg.txt`              | `dmesg`              |
+| `dhrystone_10000000.txt` | `dhrystone 10000000` |
+| `whetstone_10000.txt`    | `whetstone 10000`    |
+| `ramspeed_b1_g1.txt`     | `ramspeed -b 1 -g 1` |
+| `ramspeed_b2_g1.txt`     | `ramspeed -b 2 -g 1` |
+| `ramsmp_b1_g1.txt`       | `ramsmp -b 1 -g 1`   |
+| `ramsmp_b2_g1.txt`       | `ramsmp -b 2 -g 1`   |
+
+During the benchmarks, if the selected debug output is that of the HSM (source index 1) and if the slide switches are set to `0x00`, the LEDs display the content of the `SBGPO` HSM internal register that changes after each individual benchmark: `0x00`, `0x01`, `0x02`, `0x04`... until `0x40` after `ramsmp -b 2 -g 1`. Finally, after copying the results on the SD card, all LEDs illuminate (`SBGPO=0xff`) to show that the complete set of benchmarks for this configuration completed and that it is time to select another configuration.
+
+<!-- vim: set tabstop=4 softtabstop=4 shiftwidth=4 noexpandtab textwidth=0: -->