Calculation on GPU leads to driver error "stopped responding"

I have this little nonsense script here which I am executing in MATLAB R2013b:

clear all;

n = 2000;
times = 50;
i = 0;

tCPU = tic;

disp 'CPU::'
A = rand(n, n);
B = rand(n, n);
disp '::Go'
for i = 0:times
    CPU = A * B;
end

tCPU = toc(tCPU);
tGPU = tic;

disp 'GPU::'
A = gpuArray(A);
B = gpuArray(B);
disp '::Go'
for i = 0:times
    GPU =  A * B ; 
end
tGPU = toc(tGPU);

fprintf('On CPU: %.2f sec\nOn GPU: %.2f sec\n', tCPU, tGPU);

Unfortunately after execution I receive a message from Windows saying: "Display driver stopped working and has recovered.".

Which I assume means that Windows did not get response from my graphic cards driver or something. The script returned without errors:

>> test
CPU::
::Go
GPU::
::Go
On CPU: 11.01 sec
On GPU: 2.97 sec

But no matter if the GPU runs out of memory or not, MATLAB is not able to use the GPU device before I restarted it. If I don't restart MATLAB I receive just a message from CUDA:

>> test
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT 
> In test at 1 
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT 
> In test at 1 
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT 
> In test at 1 
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT 
> In test at 1 
CPU::
::Go
GPU::
Error using gpuArray
An unexpected error occurred during CUDA execution.
The CUDA error was:
the launch timed out and was terminated

Error in test (line 21)
A = gpuArray(A);

Does anybody know how to avoid this issue or what I am doing wrong here?

If needed, my GPU Device:

>> gpuDevice

ans = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 660M'
                     Index: 1
         ComputeCapability: '3.0'
            SupportsDouble: 1
             DriverVersion: 6
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 2.1475e+09
                FreeMemory: 1.9037e+09
       MultiprocessorCount: 2
              ClockRateKHz: 950000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

Answers


The key piece of information is this part of the gpuDevice output:

KernelExecutionTimeout: 1

This means that the host display driver is active on the GPU you are running the compute jobs on. The NVIDIA display driver contains a watchdog timer which kills any task which takes more than a predefined amount of time without yielding control back to the driver for screen refresh. This is intended to prevent the situation where a long running or stuck compute job renders the machine unresponsive by freezing the display. The runtime of your Matlab script is clearly exceeding the display driver watchdog timer limit. Once that happens, the the compute context held on the device is destroyed and Matlab can no longer operate with the device. You might be able to reinitialise the context by calling reset, which I guess will run cudaDeviceReset() under the cover.

There is a lot of information about this watchdog timer on the interweb - for example this Stack Overflow question. The solution for how to modify this timeout is dependent on your OS and hardware. The simplest way to avoid this is to not run CUDA code on a display GPU, or increase the granularity of your compute jobs so that no one operation has a runtime which exceeds the timeout limit. Or just write faster code...


Need Your Help

Workflow tool comaparison: Oozie Vs Cascading

hadoop mapreduce cascading oozie

I am looking for a workflow tool to run complex map-reduce jobs. I have Oozie in mind but also want to explore Cascading. Is there any sample code or example that chains existing M/R jobs using cas...