In the case of matrix multiplication, it's better programmed in many-core programming (using GPUs) or in case of CPU multi-threading, it would make sense to use per row in very large matrices, or ...
The program counter starts and sends the first instruction from the Instruction RAM to the Instruction Splitter. The instruction splitter splits the instruction into opcode, address of A, address of B ...