When realizing a programming language as VM, implementing behavior as part of the VM, as primitive, usually results in reduced execution times. But supporting and developing primitive functions requires more effort than maintaining and using code in the hosted language since debugging is harder, and the turn-around times for VM parts are higher. Furthermore, source artifacts of primitive functions are seldom reused in new implementations of the same language. And if they are reused, the existing API usually is emulated, reducing the performance gains. Because of recent results in tracing dynamic compilation, the trade-off between performance and ease of implementation, reuse, and changeability might now be decided adversely.
In this work, we investigate the trade-offs when creating primitives, and in particular how large a difference remains between primitive and hosted function run times in VMs with tracing just-in-time compiler. To that end, we implemented the algorithmic primitive BitBlt three times for RSqueak/VM. RSqueak/VM is a Smalltalk VM utilizing the PyPy RPython toolchain. We compare primitive implementations in C, RPython, and Smalltalk, showing that due to the tracing just-in-time compiler, the performance gap has lessened by one magnitude to one magnitude.