[Operating Systems] CPU Virtualization - Limited Direct Execution, Context Switching

다영 윤
Mar 16, 2022
6 min read

Updated: Mar 17, 2022

[Reference] 'Operating Systems: Three Easy Pieces' Chapter 6. Mechanism: Limited Direct Execution

There are two main problems when running mutiple processes on the CPU through time sharing.

Problem 1) Performance How can we implement time sharing without causing excessive overhead to the system?
Problem 2) Control How can the operating system regain control of the CPU (from a running user process) and switch between different processes?

Problem 1 is solved by using the 'Limited Direct Execution' protocol & 'Mode Switch'.

Problem 2 is solved through 'Timer Interrupts' & 'Context Switch'.

Performance - Limited Direct Execution & Mode Switch

In the direct exection protocol, we create a process and start running its main() routine. The process runs natively on the CPU and the OS has to wait for the process to complete to gain back control.

There are two problems with this apporach. One is that the OS has no control over what the process can do, which means it can perform dangerous operations such as corrupting another process's address space. Another is that the operating system can't stop a process and switch to a different one, thus time sharing is not possible.

The second problem will be addressed in the second part of the post where we discuss timer interrupts and context switching. In this first part, we talk about the OS solution for solving the first problem.

Restriceted Operation Problem

Processes have to perform operations such as accessing the file system, creating/destroying processes, communicating with different processes, etc. These are restricted operations because they can lead to dangerous behavior as we mentioned before. Developers came up with the 'Limited Direct Execution' protocol to allow processes to perfrom restricted operation without having control over the system.

In this protocol, there are two modes of operation, user mode and kernel mode. As the name suggets, priviliged operations can be run only in the kernel mode by the OS(kernel). User mode has less priviliges and can only run user program code. With this method, a user program can't access system resources itself; it has to politely ask the OS to perform restricted operations when needed. The interface for asking the OS to perform priviliged operations are called system calls.

A system call is like a normal procedure call, but it internally invokes a trap instruction. A trap instruction jumps to the kernel and raises the privilige level to kernel mode, allowing the kernel to perform whatever operations the user program has asked for. The user program tells the OS what operation is wants done by specifying the system call number (sometimes by writing it in an agreed upon stack location or register). The kernel will run the corresponing trap handler (code to be run when ceratin exceptions occur) by referencing the trap table.

The trap table is set up by the operating system at boot time, and contains the location of the trap hanlder for each trap(system call) or interruption. Once the kernel has done its job, it calls a return-from-trap instruction, which reduces the privilige to user mode and gives back control of the CPU to a process.

Note that the user program only knows the system call number and never the location of the trap handler. Otherwise, trap hanlder code can be easily corrupted to perform malicious behaviour.

This entire process is summed up into the limited direct execution protocol.

Mode Switch

Mode switching includes several steps that need to be done when switching between user mode and kernel mode.

Trap: user mode -> kernel mode
1. Store process's CPU state (register context) into its kernel stack
2. Chage the stack pointer to the kernel stack from the user stack (esp: user stack -> kernel stack)
3. Jump to the kernel code (eip: user code -> kernel code)
Return-from-trap: kernel mode -> user mode
1. Restore CPU state from the process's kernel stack
2. Change the stack pointer to the user stack from the kernel stack (esp: kernel stack -> user stack)
3. Jump to the user code (eip: user code -> kernel code)

Kernel stack is allocated for each process and is placed in the memory right next to the process's address space. It can be tricky to know how it is different from the user stack.

User Stack: inside the address space. stores program data (local variables, function parameters, return addresses...) -> only used in 'user mode'
Kernel Stack: seperate from addresss space. stores register context so process can resume after a trap. -> only used in 'kernel mode'

Trap Table Structure in x86 - Interrupt Descriptor Table

In x86 systems, after the OS initializes the trap table called IDT(Interrupt Descriptor Table) at boot time, and the base address to each IDT entry will be saved in special registers called IDTR (Interrupt Descriptor Table Register). When the CPU receives an interrupt request, it will examine the interrupt number and look up the IDT base address from the corresponding IDTR. Now we know the IDT entry containing the location of the trap handler we need. Trap handlers are called ISRs (Interrupt Service Routine) in the x86 system.

Fun fact: the reason we can't use any keyboard or mouse input at boot time is because the OS hasn't finished stroring the base address of IDT entries into the IDTR registers.

For example, if the user program invoked a system call with interrupt number 0x80, and interrupt reqeust to the interrupt contoller will be created. The CPU will refernce the IDTR register corresponding to the 0x80 interrupt and get finally get the address to the correct ISR from the IDT table.

Control - Timer Interrupts & Context Switch

Let's look back at the first part of the post where we talked about the two problems with virtualizaing the CPU.

Problem 2) Control How can the operating system regain control of the CPU (from a running user process) and switch between different processes?

The operating system has to stop a running process and resume another (switch between processes) to make sure each of them get enough time running on the CPU (thus th e term time sharing). That is why we are concerened with the problem of the OS regaining control from a running process. When a process is running, the OS can't do anything to regain control since... Mmm, it isn't running!

Computer scientists decided to get some help from the hardware to solve this problem. A timer interrupt is invoked from a timer device every so many milliseconds, returning contorl back to the OS. Without timer interrupts, the operating system will have to wait submissively for a process to give back control voluntarily, either through system calls or traps invoked by illegal operations (devision by zero, illegal memory access). This cooperative approach obviously has problems of processes ending up in infinite loops and failing to transfer control.

Nowadays, we use the non-cooperative approach, where the OS regains control periodically and securely through hardware initiated timer interrupts. The timer device is starter by the OS at boot time. Once the OS gains back control, it now decides if it will switch to a different process or continue running the current one. If the decision is to switch, a context switch routine is executed to save the old process's state and restore the new process's state (memory state + CPU state).

In the above scenario, process A is being stopped and process B is resumed. The steps colored in blue, storing/restoring register context to/from the kernel stack, are related with mode switching. The steps in yellow show how a timer interrupt invokes a mode switch, transfering control to the kernel. Context switch occurs since the kernel decides to switch to process B. In the context switch, process A's register values are saved to its proc_sturct(C structure representing the 'Process Control Block') and process B's register values are restored, also from its proc_struct. The register values saved in process A's proc_struct will later be restored when the OS chooses to resume it.

Also, restoring process B's eip(instrucion pointer) and esp(stack pointer) register will have some additional effects. Restoring the esp register means the current stack is changed to the new process's kernel stack. Restoring the eip register means we are jumping from process A's code to process B's code.

Note that the kernel stack is used to hold register context temporarily for mode switching, and the proc_struct (C structure representing the 'Process Control Block') is used for saving registers in context switching.

Context switch is, eventually, just a piece of low level code.

mov 4(%esp), %eax The current stack is the kernel stack of the old process. esp is pointing to the return address. Therefore 4(%esp) is the location of the old context (the context structure in proc_struct).

So now, eax is pointing to the old process's context in its PCB.

popl 0(%eax) This pops from the return address from the old process's kernel stack and saved as the eip value in the context file. -> Stores old eip

mov %esp, 4(%eax) ~ mov %ebp, 28(%eax) The stack pointer register (address of the old process's kernel stack) is saved in the context file, along with all the other general registers. -> Stores old esp (& other register context)

mov 4(%esp), %eax eax is now pointing to the register context of the new process that will be resumed.

mov 28(%eax), %ebp ~ mov 4(%eax), %esp Restores new esp (& other register context) -> Current stack is changed to new process's kernel stack

push 0(%eax) Pushes the return address back on the stack. -> Restores new eip

Performance - Limited Direct Execution & Mode Switch

Restriceted Operation Problem

Mode Switch

Trap Table Structure in x86 - Interrupt Descriptor Table

Control - Timer Interrupts & Context Switch

Comments