400x faster Matrix multiplication for Ruby

A friend called Allan de Medeiros Martins has made me loose some time playing with Restricted Boltzmann Machines just for fun!
Matrix multiplication is a critical operation in respect to the performance of the algorithm we’ve been discussing. Ruby has a Matrix class at the standard library and its Matrix#*  method does the job!
But, the whole thing was really slow compared to the matlab version of the code.
Then I implemented a simple version of the matrix multiplication using Array of Arrays and I was surprised that was something around 2.5x faster than the specialized Matrix#* method. Unfortunately, this was not even acceptable yet.
Doing some search I’ve (re)reached SciRuby project and their NMatrix library. AMAZING project! The NMatrix#dot method does the correct multiplication (dot product), while NMatrix#* is just an element-by-element multiplication. Using NMatrix#dot  I could reach to 3x faster compared to Matrix#*  and something around 1.2x faster than my Array version. But something was not right, I was expecting a much more significant improvement in speed. After digging around, gotcha! In some part of the code I was using NMatrix#map  and this method was returning an “untypedNMatrix  object as advised at the documentation.

“Note that map will always return an :object matrix, because it has no way of knowing how to handle operations on the different dtypes.”.

Well, with an untyped matrix all the C specialized algorithms are disabled and we can’t get a good speed boost. So I have changed the code to guarantee that a :float64 NMatrix was used in all steps of the algorithm. Boom! The NMatrix#dot with a dtype: :float64 is more than 400x faster than Matrix#*.

Benchmark Result Screenshot

See code at:

https://github.com/abinoam/matrix_dot_bench – /exe/matrix_dot_bench

Environment:

ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin15]
System Version: OS X 10.11.6 (15G1217)
Kernel Version: Darwin 15.6.0
MacBook Pro (13-inch, Late 2011)
2,4 GHz Intel Core i5
16 GB 1600 MHz DDR3

2 respostas para “400x faster Matrix multiplication for Ruby”

Deixe uma resposta

Esse site utiliza o Akismet para reduzir spam. Aprenda como seus dados de comentários são processados.