![]() |
|
|
#1 |
|
Gast
Beiträge: n/a
|
Hi,
I want to paralyze with OPENCL / JOCL this matrix calculation : static void cal(DoubleMatrix1D Ri_Y, DoubleMatrix2D Mi_YY ) { for (int r = Ri_Y.size() - 1; r >= 0; r--) { Ri_Y.setQuick(r, expE(Ri_Y.getQuick(r))); if (Mi_YY != null) { for (int c = Mi_YY.columns() - 1; c >= 0; c--) { Mi_YY.setQuick(r, c, expE(Mi_YY.getQuick(r, c))); } } } } The matrix are from cern.colt.matrix. My first question is how to map this java object matrix with a openCl stucture. regards |
|
|
|
#2 |
|
Registriert seit: 05.08.2008
Beiträge: 378
|
Hello
So you want to replace all elements of a Vector and a Matrix with the value expE(element) ?! Note that, as far as I know, there currently are no implementations of OpenCL that support double precision (except, maybe, on MacOS?). So the values of the matrix will probably have to be converted to an 1D array of float values for the computation. There may be several ways to achieve this, and it's hard to tell which is the best one beforehand. A first approach would be to simply walk through the matrix and write the values into a float array Code:
for (int r=0; r<rows; r++)
{
for (int c=0; c<cols; c++)
{
floatArray[c+r*cols] = (float)matrix.getQuick(r,c);
}
}
Since you did not use anything like getNonZeros in the exsiting code, I assume that the Matrices are Dense (or more specifically: That they are of the specific type DenseDoubleMatrix1D/2D). IF the OpenCL implementation supported double values, you could even consider to use a specific subclass of DenseDoubleMatrix, which exposes the array of values which is used internally (via a get-Method - this is possible since this array is only protected and not private). This would save the effort of the loop from above, since you could copy this array directly into a cl_mem object. bye |
|
|
|
|
|
#3 |
|
Gast
Beiträge: n/a
|
Hi Marko,
First thanks a lot for your very detailed response. So you want to replace all elements of a Vector and a Matrix with the value expE(element) ?! Yes, this task is call 200 000 by hour in a artificial learning program. Thank for all |
|
|
|
#4 |
|
Registriert seit: 05.08.2008
Beiträge: 378
|
Some more details might be helpful, e.g.
- whether this is a sparse or a dense matrix - whether it HAS to be stored and/or computed in double precision - whether this step or addidional operations may be processed solely on the graphics card For example, if you have a large sparse matrix which HAS to be in double precision, and the operation you described is the only one that may be performed on the GPU, the speedup might not be so great. But if you have a dense matrix with float entries, and you do NOT have to copy the data between the host and the device in each step, this could be more beneficial. |
|
|
|
|
|
#5 |
|
Gast
Beiträge: n/a
|
Hi Marco,
I use DenseMatrix. The matrices use double, i need to estimate if can use float inside double. I had make a very simple benchmark : 1 convert the 1D martix to a float array 2 make the exp calculation on GPU with this openCL code. private static String programSource = "__kernel void " + "sampleKernel(__global const float *a," + " __global float *c)" + "{" + " int gid = get_global_id(0);" + " c[gid] = exp(a[gid]) ;" + "}"; the first result is bad, the opencl code is 10 time slower, but my configuration is pore two pseudo GPU (ATI stream). The code for this execution time : // Set the arguments for the kernel clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(memObjects[0])); clSetKernelArg(kernel, 1, Sizeof.cl_mem, Pointer.to(memObjects[1])); System.out.println("clSetKernelArg"); // Set the work-item dimensions long global_work_size[] = new long[]{nb}; long local_work_size[] = new long[]{1}; // Execute the kernel clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, global_work_size, local_work_size, 0, null, null); System.out.println("Execute the kernel"); // Read the output data clEnqueueReadBuffer(commandQueue, memObjects[1], CL_TRUE, 0, n * Sizeof.cl_float, dst, 0, null, null); I have a question : how can i use and reuse the same openCL program at each iteration without re initialize the onenCL context ? I will make this benchmark with a nvidia card. Thanks a lot kim |
|
|
|
#6 | |
|
Registriert seit: 05.08.2008
Beiträge: 378
|
Hello,
As I mentioned, there are several aspects that may influence the speedup that can be achieved. When you have a Java Code like Java Code:
Java Code:
Zitat:
Java Code:
So that in the actual "compute" method, you only have to copy the data to the device, execute the kernel, and copy the data back to Java. bye |
|
|
|
|
![]() |
| Lesezeichen |
| Stichworte |
| - |
| Aktive Benutzer in diesem Thema: 1 (Registrierte Benutzer: 0, Gäste: 1) | |
| Themen-Optionen | Thema durchsuchen |
|
|
Ähnliche Themen
|
||||
| Thema | Autor | Forum | Antworten | Letzter Beitrag |
| About JOCL | Marco13 | JOCL | 3 | 18.08.2010 16:06 |
| JOCL Hello World program | Soyeed | JOCL | 2 | 05.07.2010 11:09 |
| JOCL NVIDIA ArrayIndexOutOfBoundsException | Marcin | JOCL | 3 | 24.03.2010 19:35 |
| Matrix, Crawler und Verknüpfung | Unregistriert | Hausaufgaben | 9 | 31.01.2010 12:34 |
| Hacker im CERN-Teilchenbeschleuniger | Revenant | Sicherheit | 1 | 15.09.2008 06:02 |