Spectre and Meltdown: an explanation of the 3 security vulnerabilities and the measures required to protect your services against them (for IT specialists)
Since the Spectre and Meltdown vulnerabilities were discovered on the x86-64 processors and publicly announced last week, the IT sector has been working hard to find ways of resolving their security issues. First of all, we needed to protect all computer hardware against the possibility of these vulnerabilities being exploited. Then we needed to measure the impact of these patches on system performance and explain the security threats, so that users can quickly perform the updates pushed by operating system and software publishers. A lot of research was also required. It must be said that it’s very tricky to fully understand security vulnerabilities and the ways in which they can be exploited. And these vulnerabilities are built into the design of the CPU, on a hardware level, which is unprecedented - at least on this scale. And the measures taken against them are varied: patches for software and operating systems to avoid data being leaked, and modifications for the microcode of processors (their firmware updates released).In this article, it is not our aim to offer a widely understandable explanation of how attacks would work that exploit the Meltdown and Spectre vulnerabilities. However, we would like to provide our contribution to the technical understanding of these vulnerabilities, from the perspective of one of our security architects. To stay updated on the operations in progress at OVH to protect infrastructures against Spectre and Meltdown, and find all of the information published to help you understand the situation, please go to .
Speculative code execution
Spectre and Meltdown cover three distinct attack vectors. The first of the two vulnerabilities can be exploited using two different methods. Nonetheless, all three attack vectors are linked to the speculative code execution of CPUs. Behind this complex statement are a number of hidden optimisations carried out over the course of around ten years by the CPU manufacturers, to improve processing speed. In simple terms, it would mean that the CPU would start to execute an instruction before the last instruction has finished executing.
The recent x86 CPUs are, in fact, CISC processors with RISC cores. An instruction decoding unit decodes each x86 instruction sent to the CPU, and breaks it down into several micro-instructions, which are then executed by the CPU’s arithmetic logic units. Each CPU core has several of these specialised arithmetic logic units. For example, a Haswell CPU can carry out four arithmetic instructions, and four parallel memory accesses.
To gain a better understanding of this, below is a simple case with the following two instructions:
- increment the R1 registry
- increment the R2 registry
These two instructions will be broken down into micro-instructions - in this case, two arithmetic instructions. These two instructions are independent from one another, and since the CPU has four arithmetic logic units, they will be executed simultaneously.
Now, we can take a look at the following two instructions:
- load in the R2 registry the value present in the memory address in the R1 registry + 10
- increment the R3 registry.
The first instruction will be broken down into two micro-instructions:
- ① calculate R1+10
- ② load the previously calculated address into the memory and store it in R2
The second instruction will only make one micro-instruction:
- ③ calculate R3+1
Both micro-instructions ① and ② are dependent on one another, but ① and ③ are not. As a result, the CPU can launch the execution of ① and ③ on the arithmetic logic units. Once ① is complete, it will launch micro-instruction ② - and once it is complete, it will consider these two instructions to be complete.
In fact, the second micro-instruction will be completed before the first. However, to maintain consistency, the CPU makes the changes appear in order. So even if the CPU has already calculated the new value of R2, the registry will only be modified once the first micro-instruction is complete. The fact that, as demonstrated in the example above, it can pre-calculate values during slow instructions (loading, for example), means that the CPU’s input/output ratio can be increased.
In short, this is what we know about speculative execution - or the anticipated launch of an instruction, which can go as far as speculating future instructions to be executed (even if the result is dumped, in situations where it is found that the instruction does not need to be executed). We will come back to this further on.
Coming back to our example: what happens if the first micro-instruction generates an error? Normally when this happens, the CPU will dump everything it has calculated, and generate the error. Unfortunately, these pre-calculations have been found to leave visible traces outside of the CPU. These traces are subtle, but measurable - as can be seen in the first example.
Coming back to the three attack vectors, we will start with the most simple one: Meltdown.
Normally, the kernel memory cannot be accessed by a user program. So when a program tries to access it, it will generate an error. In the case we are using, here are the instructions executed:
- load in the R1 registry the value to a kernel address
- load in the R2 registry the value to an address dependent on the value previously retrieved
Ideally, the first instruction is executed, generates an error, the execution flow stops, and the second instruction is never executed. However, access errors are only managed at the end of the CPU’s pipeline, once all of the micro-instructions have been executed. Which allows the second instruction to be executed, if the required processing resource is available. When the error for the first instruction is generated, the CPU is supposed to cancel what it has done. But to put it simply, the load into the memory has left a trace in the CPU’s cache. Effectively, to accelerate memory accesses, the CPU has a cache that stores all of the data that has been recently accessed. If the value that we are trying to read is stored in the cache, the access time is significantly faster. There is a way of finding out whether a value that it is trying to read is stored in the cache.
This is not of much interest to us: here, we are looking to do the operation in reverse. We will start by filling the cache with our own data, then executing the code above. Since the data that the CPU tries to read is not found in the cache, it will fill a part of the cache with this data (thus deleting part of the data present in the cache to create space). The algorithm that determines where the data is stored in the cache is simple: it simply depends on the address of the data. The part of the cache that will be emptied will depend on the address loaded (which itself depends on the value it needs to retrieve). We will then need to read our data again whilst measuring access times, and find out which part of the cache has been emptied to deduce the value we need.
What needs to be done for Meltdown? For the Linux kernel, a kernel patch called PTI (Page Table Isolation). It ensures that the kernel’s memory ranges no longer appear in the address space of processes (previously they appeared, but were inaccessible). The result: it is impossible for an instruction to read a value. Unfortunately, for each each system call, the kernel will need to change the memory mapping - which comes at a cost. It decreases performance. Other operating systems such as Mac OS or Windows have deployed similar patches.
With Spectre, the method of attack differs slightly, but it also relies on the time the CPU takes to access the memory to retrieve data.
Spectre variant 1 (Bounds check bypass)
Let’s take a look at the following code:
if (x < y)
z = array1[x]
This code simply checks whether or not you can access array1 externally. Logically we would expect the CPU to first verify that “x” is inferior to “y” before continuing. However, in order to save time, the CPU will not always execute the condition before running the operation. It will speculate as to whether or not the condition is true or false. If it predicts that the result is going to be true, it will run the next instructions, while it waits for the result of the condition. If the prediction was correct, some time will have been saved. Otherwise, as before, it cancels what it was going to execute.
In this example, a potential attacker would need to train the CPU to asses the condition as correct, so that it can then be given a totally arbitrary value. Let’s imagine that reading array1[x] corresponds to reading the array1 + x address. If we manage to give the following value attacked address - array1 address, the CPU will read array1 address + (attacked address - array1 address), or attacked address. In this example, we haven’t gone too far, because we have only managed to get the CPU to read something at the address that we already knew.
Now, let’s take a look at the following code:
if (x < y)
z = array2[array1[x]]
In this example, the CPU is supposed to read an address which depends on the result of array1[x] However, as we have just seen, we can manipulate the result of array1[x] to ensure that it is a value that we want to retrieve. The CPU will therefore load an address which depends on this value. If we go back to the details mentioned for Meltdown, the CPU will have to empty a line of cache to load this data, and we will therefore be able to deduce the value that exists at the attacked address.
In this case, we are dealing with a CPU logic problem, which is down to the way the processors’ hardware is designed. This cannot be patched at the kernel level. One solution is to modify the programs generated (we are talking about patching the compilers), by inserting a serializing instruction after theif() explicitly asking the CPU not to speculate anymore. INTEL/AMD recommends using LFENCE. But, given that speculative execution aims to save time, we are forcing the CPU to wait, and naturally, this impacts performance.
Spectre variant 2 (Branch Target Injection), is the most vicious.
CPU x86 will let you call a function whose address is stored in memory. Because retrieving this value in memory is time-consuming, the CPU will try to guess the address, and it will start executing the instructions. Once it has got the address, it checks to see if it has the right location or not. If it found the correct location, it saves time. If not, it cancels what it began to execute. It’s a win-win situation.
But what happens if we manage to train the CPU to predict incorrectly? And what if we manage to control what it incorrectly predicts? And what if we manage to call a function which will read memory based on a value we want to retrieve? Remind you of anything?
This is exactly what researchers have managed to do. As with the other two vulnerabilities, when the memory is read, a trace is left which means the value can be extracted.
There is no way to patch the kernel to secure applications in this instance. As with the Spectre 1 variant, one solution is to load the code generated by the compilers to prevent the CPU from predicting the location of functions.
There are two ways to patch this vulnerability: the first is to modify the compilers (Retpoline); the second is based on the CPU firmware patch, which includes a new feature which Intel is working on, allowing the kernel to control CPU predictions. But, given that the kernel is also vulnerable to this method of attack, it has to be patched as well.
As you can probably tell, exploiting Meltdown and Spectre vulnerabilities is no small matter. Exploiting vulnerabilities in a lab setting is all well and good, but we have no information to date indicating an attack by this means in a real environment. Further ways of exploiting these vulnerabilities will probably be invented in the future. Nevertheless, we shouldn’t underestimate those working on this today. And let’s take advantage of the head start we have on them, this time, to pull the rug out from under them.