Multiplying and inverting matrices

Marco13 · 20. August 2010 um 10:04

Hello

After a first attempt, the kernel spills out "NaN"s when it’s given an identity matrix :twisted:

I found another matrix inversion source code in the NVIDIA Forum, I’ll try this next monday or tuesday.

bye
Marco

franja89 · 20. August 2010 um 11:25

Hi,

Oh, what a pity! but at least we have another option…

Thanks,
Fran

franja89 · 20. August 2010 um 13:30

Hi Marco:

I have found three implementations that it can be interesting.

Can I write here its urls or it is forbidden???

Bye
Fran

Marco13 · 20. August 2010 um 16:26

Hi Franja,

As I already mentioned: „There are many approaches for inverting matrices, …“. The challenge will be to get it running (efficiently) with CUDA (and handling the large matrices - I still don’t believe that this will be easy ). I’ll test the code from the NVIDIA forum as soon as possible.

Of course you can post the Links here, as long as there are no legal reasons not to do so…

bye
Marco

franja89 · 21. August 2010 um 03:15

Ok, ok

See your next week,

Bye
Fran

system · 22. August 2010 um 00:35

im also working with cublas and i’d also love to have an opencl implemented linear solver. only with jcublas however, this is not possible. it only has methods that solves linear equations for triangular matrices, and inverting those is not very hard (concering the algorithm). CULA has the right functions. POTRF and POTRS are very interesting for me because i have positive definite symmetric matrices and cholesky decomposition is the fastest algorithm for dense matrices like those as far as i know. But CULA is not free and C, thats the problem

i also thought about implementing some matrix decompositions in opencl, but i came to the inclusion that at least cholesky is much too sequential. well, if cula has parallel cholesky, i have to give it a second try some time…

franja could u post he links u mentioned, or is it illegal?

Marco13 · 22. August 2010 um 04:44

Hello,

I assume that you already found it, but if not, you might be interested in ViennaCL. Of course, I already thought about a „JViennaCL“ but currently don’t have the time for that. Beyond that, it would certainly be nice to have more LAPACK and general solving and decomposition routines available in Java via JCuda or JOCL. (The problem may be that people always want to use these libraries to solve interesting problems, and no one is willing to write such a library). But I think it’s mainly a matter of time until some libraries that are providing these functions will be ported or routed through to Java.

bye

system · 23. August 2010 um 01:13

no havent found it yet, thanks for the hint

Marco13 · 23. August 2010 um 09:50

Hello

I have ported the source code that was posted in this thread. The kernel files have been left unmodified (except for adding ‘extern “C”’ to the functions).

I tried to contact the author and asked whether I may upload this to the jcuda.org website, still waiting for a response.

bye
Marco

Marco13 · 25. August 2010 um 10:10

Hello

I received a response from the author of the matrix inversion code. It’s OK to publish it, so I’ll upload it to the website soon.

bye
Marco

franja89 · 27. August 2010 um 12:37

Hi Marco:

I continue trying to compile cu files.

I have added “cl.exe” path (C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin) but the error continues.

I have click above “cl.exe” and this message appeares: “The program can iniciate. mspdb100.dll is missing. You must reinstall the program”.

I have reinstalled visual studio and all is the same.

bye
Fran

Marco13 · 27. August 2010 um 14:58

Hello

I cannot test this at the moment, but two things: You usually do not start cl.exe manually with a double click. The cl.exe should be called by the nvcc.exe internally. And if you receive an error message, please try to post it verbatim - I can probably not do much more than a websearch on the error message, but maybe it is possible to find an answer this way.

EDIT: By the way, did you manage to compile one of the NVIDIA SDK samples? They should compile right “out of the box”, simply by opening the Project file and selecting the build command from the menu.

bye
Marco

Marco13 · 30. August 2010 um 11:13

Hello

The example for matrix inversion has been uploaded to the website.

BTW @Franja: Did you manage to solve your problem with Visual Studio?

bye
Marco

franja89 · 10. September 2010 um 12:17

Hi Marco:

After a mini-holidays I continue with my project and i have had some problem:

I want to multiply a 1000x1000 matriz with a 1000x1 one I have written that:

MatrixMultiply.multiply(inversa2, matriz.length, matriz.length, soluciones, 1, W);

and that is the error:

	at java.nio.FloatBuffer.wrap(Unknown Source)
	at jcuda.Pointer.to(Pointer.java:115)
	at recursos.MatrixMultiply.multiply(MatrixMultiply.java:77)
	at pruebas.RBF.entrenarQuda(RBF.java:70)
	at pruebas.RBF.entrenar(RBF.java:52)
	at pruebas.pruebonKohonen2.main(pruebonKohonen2.java:147)```

What is the problem????

Thanks
Fran

Marco13 · 10. September 2010 um 16:26

Hello Franja

Well… presumably, one of the arrays you are passing in to this call is ‘null’. You might run it in a debugger, or simply add some
System.out.println("inversa2 "+inversa2);
System.out.println("soluciones "+soluciones);
System.out.println("W "+W);
before the call to see what’s wrong there.

BTW: I think the Matrix Inversion example requires the matrix size to be a multiple of the block size.

bye
Marco

franja89 · 11. September 2010 um 04:38

Thanks Marco:

I have resolved the multiply problem but I think that I will have problems with the inverse problem (size of block) that you have said.

what is the “block size”?

Thanks

Fran

Marco13 · 11. September 2010 um 06:23

Hello

As you might already have read in the CUDA Programming Guide or similar documents, the threads in CUDA are organized in Blocks, and the Blocks are organized in Grids (as depicted, for example, on the left side of this image).

The block size is the number of threads per block. For the Matrix inversion, it is defined in the MatrixInvert class: MatrixInvert.BLOCKSIZE is currently 16, so there are 16x16 Threads in one Block. This roughly(!) means that the matrix is divided into Blocks of size 16x16, and each Block is computed individually.

Most examples (even the Matrix Multiplication Example from NVIDIA) make certain assumptions about the size of the matrix with respect to the block size. Generalizing such a task to arbitrary sizes may be difficult. To compute the last blocks, which are not filled completely, it might even be necessary to create special kernels. In the best case, it is possible to introduce some padding. That is, for example, to extend a matrix from size 30x30 to a Matrix of size 32x32 in a way that does not affect the result, but I’m not sure if (or even how) this could be done for the Gauß elimination in the matrix inversion example.

I’d still like to write a general Matrix Inversion for arbitrarily-sized matrices by porting the LAPACK “GETRF” function to use some of the CUBLAS functions, but as I already mentioned, this may not be so easy and take some more time…

bye
Marco

franja89 · 12. September 2010 um 17:25

Hello

You are right. Depending on random factors my program generates matrices of different sizes. So sometimes the program endes and other times it generates errors.

If you got to implement an arbitrarily-sized matrices code you will do me a favor.

Thank you.

Fran

franja89 · 12. September 2010 um 17:26

Hello Marco:

According with some weeks ago questions, I haven´t solved my problem with Visual Studio.

Bye

franja89 · 30. September 2010 um 11:11

Hi Marco:

Some weeks ago, you commented me that you would try to write a code for general Matrix Inversion for arbitrarily-sized matrices.

I don´t want to be upset but I want to know if you have advanced in this work.

Thank you very much.
Fran