Joel Emer is a researcher on processor architecture from the VAX and the Alpha to the x86. Emer was the Director of Microarchitecture Research at Intel prior to joining NVIDIA and MIT. Emer is perhaps most well known for a paper he wrote with Doug Clark back in 84, A characterization of processor performance in the VAX-11/780. The paper presented data from a hardware monitor capturing the cycle counts and frequencies of instruction execution through the VAX 11/780 pipeline. It helped establish the performance metric of Cycles Per Instruction (CPI) and reporting average time per instruction execution on real application code execution. Here is the MIT SAIL website for Emer with his more recent publications.
Much of Wall Street analytics is basically expression evaluation inside of Monte Carlo simulation. That is where most of the compute execution cycles will go for NIM optimization, if coded efficiently. Using Emer’s performance characterization we know on a contemporary x86 we will need about 10 picoseconds on average for each IEEE 754 double add and multiply. To get a back of the envelope approximation on the execution time for an expression evaluation code you simply need to count the number of double precision adds and multiplies required and multiply by 10 picoseconds. The dominant part of the error in this approximation is the number of processor cycles required to read the operands through the cache hierarchy. The approximation assumes all the operands are available on average each 1.0 processor cycle. In real executions the operand fetches might require 1.x processor cycles. It typically depends on how many non compulsory L2 misses the code execution will suffer. The combination of multicore and low cost grids with the 10 picosecond operation average execution time brings these multi-billion security accrual portfolio simulation computations into feasible execution scope.