In these systems, a process often doesn't know where another process is located, or on which processor it is located, as this information is protected by the operating system. This allows a great deal of assurance of system integrity. Unfortunately, the use of an RPC mechanism to acheive this transparency can impose a great deal of overhead. Researchers at the University of Washington found that for the multiprocessing platforms they were concerned with, most RPC calls were actually being directed to the local machine (94.5% under Taos for the DEC Firefly). This indicates not only some overhead in the RPC mechanism, but that most of the mechanism was overhead. Multiprocessor systems are not priced to be wasted resources!
Analysis of the procedure calls handled through the RPC mechanism showed that most calls were to a small number of procedures, and that only byte-copying was required to transfer the data passed as parameters. Also, most calls passed a small amount of information (less than 200 bytes for a majority of calls, and less than 50 for most of those). Other research has suggested optimizations for the case of small parameters and some compiler-based mechanisms for passing parameters directly in machine registers.
Much of the overhead which is part of the RPC mechanism is required to maintain the system integrity and access protections. For any procedure call, a theoretical minimum dispatch overhead may be computed which includes a procedure call & return, two kernel traps, and two context switches. Each context switch includes both process and virtual memory information. Any additional work required to implement RPC is considered the overhead of the mechanism rather than useful work. The authors of the reference cited below identify the following issues which must be addressed under all circumstances:
As mentioned above, some implementations use optimization techniques to reduce various aspects of the overhead, but no single research project had achieved significant reduction of the overhead required. Optimizations included elimination of message copying, handoff scheduling, and shared message buffers. Heavyweight stubs remain a significant bottleneck.