Here I report an issue with a high performance degradation when multiply a vector with 2d-tensor in column-wise (254 tops measured) than we do it in row-wise (419 tops measured). it is int8 matmul ...
The work around is to also break the connection to the float before assigning a new vector, then re-connecting the float node to the second input on multiply. It seems like the type matching test ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results