The authors demonstrate that their system is not a ``work around'' which fails on protection issues. They use some interesting techniques to achieve cross-domain control transfer and efficiency.
The primary technique for implementing LRPC is referred to as the protected procedure call. This is a form of procedure call which is submitted through the kernel for validation and domain protection. Two kernel traps are still required, but the context switches are reduced in scope by not requiring the syncronisation of two actual threads. No message buffer management or parameter copying is required.
The most interesting and unusual aspect of the LRPC system with respect to traditional RPC implementations lies in the interface binding and procedure call mechanism. Bershad, et al., delve fairly deeply into the details in their paper, but we'll just touch on the real highlights of the approach here.
The binding and call mechanisms work in close concert in this system, with some intelligent management of kernel- and thread-level data structures. For each procedure in the interface, the kernel allocates a number of argument stacks (A-stacks) and linkage records for use in calling that procedure. The A-stacks are mapped as shared memory common to both the client and server, removing the requirement to copy parameter messages between the address spaces of each domain. The kernel returns a binding object and a list of A-stacks for each procedure in the interface. The binding object and an A-stack are used by the client to make a call into the server.
When the procedure call takes place, the stub on the client side first determines if the call is local to the machine or if a traditional RPC call is required. This is done as the first instruction of the stub based on information in the binding object. This limits wasted cycles supported remote RPC for calls which are local. The overhead of this determination for a remote call is negligible since the bottleneck for that remains the network communication.
For local RPC calls, an A-stack is selected and the arguments are pushed into that space. Since the A-stack is shared by the caller and callee, no further copying of the parameters is necessary. The stub then invokes a kernel trap, passing in the binding object, A-stack, and procedure identifier of the procedure being called. Without performing a context switch, the kernel is able to verify the binding, locate or allocate an execution stack (E-stack) in the server's address space, and modify the thread's registers to use the E-stack as its own stack. The linkage record associated with the A-stack is recorded in the thread control block for use during the return. The kernel then performs a virtual memory context switch and transfer control to the server-side stub.
The most important aspect of this call mechanism is that no thread needs to exist on the server side; the thread used is the calling thread from the client. This reduces the weight of the context switch substantially simply by reducing the amount of data that needs to be updated.
The return mechanism is much simpler. This is possible since all authentication has been performed at call time and recorded in the linkage record. Since this has been maintained in the thread control block, information only needs to be restored to registers and the virtual memory context switch needs to be reversed. Once this is accomplished, the thread can continue in its original context as the caller process.