Multiplying and inverting matrices

Marco13 · 30. September 2010 um 14:49

Hello Franja,

I’m pretty busy at the moment, have been on a business trip this week, and my ‚todo‘ list is constantly growing But it has not been forgotten: I’ll hopefully have the chance to continue with what I already started. I’ll keep on posting any progress here.

bye
Marco

franja89 · 1. Oktober 2010 um 01:11

Thank you Marco

Marco13 · 4. Oktober 2010 um 11:53

Hello

There is a long chain of functions that need to be ported when starting with the basic LU decomposition. It is quite complicated, error-prone due to the wierd fortran indexing scheme, and in any case, such a direct port of the Fortran code would most likely not be very efficient.

I strongly recommend that you use one of the existing Java Libraries for Matrix inversion. Why trying to make life harder than it is?

Maybe I’ll try to continue with this when I have (much) more time, but don’t know if or when or how long it will take. If you wish, I can send you what I have done so far (it’s quite ugly and messy, I’ll certainly not post it here).

bye
Marco

franja89 · 13. Oktober 2010 um 23:33

ok, thanks

Marco13 · 14. Oktober 2010 um 06:06

Hello,

May I ask for which purpose you need this matrix inversion? If it is just a part of a larger program, you can certainly make clear that it is not possible to invest weeks or months into such a „small“ task, especially when the result will most likely not be faster than existing libraries, considering the complexity and the knowledge and experience that is necessary to implement this properly with CUDA.
If the matrix inversion with CUDA is explicitly stated to be one task (maybe in a homework or the practical part of some thesis) then (I should not program it for you, anyhow and) you may want to have a look at the matrix inversion that is already available on the website (the one which uses the specific BLOCK_SIZE), and see if it can be adjusted or extended to work with arbitrary BLOCK_SIZEs. This would be the approach that I would try next, since plainly porting the LAPACK Fortran code does not seem to be the right way to go for me at the moment…

bye
Marco

franja89 · 25. November 2010 um 05:46

Hi Marco:

I regret the delay in replying but I’ve been outside the country.

Certainly I am writing my thesis but the thesis only shows that by paralleling a certain process I’ve improved enormously on a particular task (which is my thesis).

I need not have a perfect implementation of the inverse matrix method, but if we got the perfect matrix inversion method, we could write an article together with related material that I have and we could publish this article (I cannot use this article to my thesis but an article in a journal of research is always interesting.)

I do not know you and I do not know if you might be interested or not.

Fran

Marco13 · 25. November 2010 um 09:43

Hello,

I’ll send you a PM about that.

bye
Marco

franja89 · 1. April 2011 um 05:18

Hi Marco:

I´m very happy to write you another time.

I hope you are all right too.

Your last PM had greats news.

Pherhaps we can get the matrix inversion code.

Thank very much.

franja89 · 1. Mai 2011 um 08:14

Hi Marco:

What about the matrix inversion code?

Do you think that you´ll get it?

Thank you very much

See you soon
Franja

Marco13 · 1. Mai 2011 um 08:37

Hello

I thought you’d try to use the code from the thread that I linked. But I’ll also try to build a “cleaned up” example based on that. I’m still rather busy, I’m on a conference next week, and am still updating to CUDA 4.0, apart from my efforts in JNpp and other things, but I’ve planned a vacation for end of May, so chances are high that I will then find enough time for this.

bye

Essence · 1. Mai 2011 um 17:18

You have probably already covered it, but if not take a look at MAGMA. http://icl.cs.utk.edu/magma/

Marco13 · 2. Mai 2011 um 01:20

„Covered“ not in the sense that there already is a „JMagma“ but I’ve seen this library quite a while ago. There are several libraries built on top of CUDA and OpenCL at the moment (thrust, ViennaCL, Magma and others). It’s probably just a matter of time until the corresponding Java bindings are available for these

Marco13 · 25. Mai 2011 um 08:51

The matrix inversion example has been uploaded to the samples page, it is available as http://jcuda.org/samples/JCublasMatrixInvert.java.

franja89 · 26. August 2012 um 14:46

Hi Marco:

I have been studying your inversion matrix code and its fantastic.

However, I have one doubt on which is the complexity of this algorithm.

Will you help me about this question?

Thanks
Franja

Marco13 · 26. August 2012 um 16:01

Hello

Back again after more than one year

The code is not entirely written by me, but based on the code from this thread: http://forum.byte-welt.de/showthread.php?p=14767#post14767 (this is also mentioned in the header).

It performs an LU factorization, which should be in O(n^3) in the given form, according to http://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra

bye
Marco

franja89 · 27. August 2012 um 15:11

Hi Marco:

Glad to hear from you after a year (I was unable to work on my thesis this year and now I’m starting over).

My question was not about the complexity of the standard method, but the complexity of the CUDA parallel method …

A greeting

Fran

Marco13 · 27. August 2012 um 15:52

Hi

Well, I’m not sure what you mean… The asymptotic, algorithmic complexity ( http://en.wikipedia.org/wiki/Big_O_notation ) remains the same, regardless of whether it is executed in parallel or not.

Although, in some sense, this is questionable - there are some ideas how to describe “complexity” for parallel processors, like in this quick websearch result: http://igoro.com/archive/big-oh-in-the-parallel-world/ . Of course, something like the addition of 2 vectors with ‘n’ elements has a complexity of O(n) in the sequential case, whereas with CUDA, on ‘n’ cores, it somehow only has O(1). I don’t know whether there is a formal and generally accepted notation for this…

bye
Marco

franja89 · 15. September 2012 um 04:26

Hello Marco:

As you can see every day I feel a little more lost.

Now I have two new problems:

I had to format my computer and now I can not run your file “JCublasMatrixInvert.java”. I program with eclipse.
I installed “cudatoolkit_4.2.9_win_64.msi”, “devdriver_4.2_winvista-win7_64_301.32_general.exe” and “gpucomputingsdk_4.2.9_win_64.exe” as the developers cuda area asked for.
I also added the libraries “jcublas-0.3.1.jar”, “jcuda-0.3.1.jar” and “jcudaUtils-0.0.1.zip” to my java project.
After all this, I get the following error:
Error while loading native library with base name “JCublas”
Operating system name: Windows 7
Architecture: x86
Architecture bit size: 32
Exception in thread “main” java.lang.UnsatisfiedLinkError: Could not load native library
at jcuda.LibUtils.loadLibrary (LibUtils.java: 79)
at jcuda.jcublas.JCublas.assertInit (JCublas.java: 174)
at jcuda.jcublas.JCublas.cublasInit (JCublas.java: 198)
at matrixInversion.JCublasMatrixInvert.main (JCublasMatrixInvert.java: 34)
My second problem is that I must calculate the complexity of both the matrix multiplication as the inverse matrix calculation.
I know, without cuda, that the complexities are O(n3) but when we apply all parallelization I do not know how to deal with that problem. In fact, is that what we call jcublas libraries, I can not even see the source code.

I hope you can help me.

thank you very much
Fran

Marco13 · 15. September 2012 um 10:29

You have to use JCuda version 0.4.2 - the version that matches the CUDA 4.2 toolkit.

My second problem is that I must calculate the complexity of both the matrix multiplication as the inverse matrix calculation.
I know, without cuda, that the complexities are O(n3) but when we apply all parallelization I do not know how to deal with that problem. In fact, is that what we call jcublas libraries, I can not even see the source code.

As I already posted above: I’m not aware of any concept that formally captures parallelism when computing asymptotic complexities, and during a quick websearch that I did for the previous answer I did not find anything that looked like a profound and reliable „standard“ or „convention“. Furthermore, the general topic of complexity analysis is … well, complex

And of course, I have to admit that I’m not an expert in that. This also refers to the difference between the complexity of the problem itself and the running time of an algorithm (it has been a while since I heard my last lectures about theoretic computer science :o ). For example: Sorting a list of numbers has the complexity O(n log n). There is NO algorithm that can sort a list faster, and there will never be one. This can be proved. For more complex tasks, it’s probably not so easy to find (and prove) the exact complexity. For example: A matrix multiplication „looks like“ having O(n^3), but researchers found that it only has O(n^2.807), later they found it can be done in O(n^2.3755), and later they found even lower bounds, as shown in the diagram on Matrix multiplication - Wikipedia

So when it comes to the question of the actual asymptotic running time of a certain algorithm when parallelization should be taken into account, I’m not sure how this can be expressed. The complexity of the problem will stay the same, but I think the running time can be expressed, for example, by saying: „The complexity is O(n^3), and the running time is O(n^2) on O(n) processors“.

Do you have an advisor for your work, who might be able to give you more profound information about what he would expect you to write?

franja89 · 16. September 2012 um 04:21

Then I have this new error

Error while loading native library “JCublas-windows-x86” with base name “JCublas”
Operating system name: Windows 7
Architecture : x86
Architecture bit size: 32
Stack trace from the attempt to load the library as a resource:
java.lang.NullPointerException: No resource found with name ‘/lib/JCublas-windows-x86.dll’
at jcuda.LibUtils.loadLibraryResource(LibUtils.java:151)
at jcuda.LibUtils.loadLibrary(LibUtils.java:83)
at jcuda.jcublas.JCublas.initialize(JCublas.java:82)
at jcuda.jcublas.JCublas.(JCublas.java:70)
at matrixInversion.JCublasMatrixInvert.main(JCublasMatrixInvert.java:34)
Stack trace from the attempt to load the library as a file:
java.lang.UnsatisfiedLinkError: no JCublas-windows-x86 in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1709)
at java.lang.Runtime.loadLibrary0(Runtime.java:823)
at java.lang.System.loadLibrary(System.java:1028)
at jcuda.LibUtils.loadLibrary(LibUtils.java:94)
at jcuda.jcublas.JCublas.initialize(JCublas.java:82)
at jcuda.jcublas.JCublas.(JCublas.java:70)
at matrixInversion.JCublasMatrixInvert.main(JCublasMatrixInvert.java:34)

Exception in thread “main” java.lang.UnsatisfiedLinkError: Could not load the native library
at jcuda.LibUtils.loadLibrary(LibUtils.java:129)
at jcuda.jcublas.JCublas.initialize(JCublas.java:82)
at jcuda.jcublas.JCublas.(JCublas.java:70)
at matrixInversion.JCublasMatrixInvert.main(JCublasMatrixInvert.java:34)