numactl --interleave=all ./testing_cgetrf -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.5.0  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_cgetrf [options] [-h|--help]

ngpu 1
    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      0.06 (   0.05)     ---   
 1000  1000     ---   (  ---  )    164.94 (   0.02)     ---   
   10    10     ---   (  ---  )      0.27 (   0.00)     ---   
   20    20     ---   (  ---  )      0.77 (   0.00)     ---   
   30    30     ---   (  ---  )      1.87 (   0.00)     ---   
   40    40     ---   (  ---  )      3.38 (   0.00)     ---   
   50    50     ---   (  ---  )      4.54 (   0.00)     ---   
   60    60     ---   (  ---  )      2.66 (   0.00)     ---   
   70    70     ---   (  ---  )      1.03 (   0.00)     ---   
   80    80     ---   (  ---  )      1.36 (   0.00)     ---   
   90    90     ---   (  ---  )      1.82 (   0.00)     ---   
  100   100     ---   (  ---  )      2.61 (   0.00)     ---   
  200   200     ---   (  ---  )     12.00 (   0.00)     ---   
  300   300     ---   (  ---  )     26.88 (   0.00)     ---   
  400   400     ---   (  ---  )     45.57 (   0.00)     ---   
  500   500     ---   (  ---  )     66.94 (   0.00)     ---   
  600   600     ---   (  ---  )     89.86 (   0.01)     ---   
  700   700     ---   (  ---  )    113.42 (   0.01)     ---   
  800   800     ---   (  ---  )    141.16 (   0.01)     ---   
  900   900     ---   (  ---  )    167.06 (   0.01)     ---   
 1000  1000     ---   (  ---  )    193.47 (   0.01)     ---   
 2000  2000     ---   (  ---  )    492.24 (   0.04)     ---   
 3000  3000     ---   (  ---  )    840.60 (   0.09)     ---   
 4000  4000     ---   (  ---  )   1069.88 (   0.16)     ---   
 5000  5000     ---   (  ---  )   1106.83 (   0.30)     ---   
 6000  6000     ---   (  ---  )   1360.64 (   0.42)     ---   
 7000  7000     ---   (  ---  )   1568.24 (   0.58)     ---   
 8000  8000     ---   (  ---  )   1735.00 (   0.79)     ---   
 9000  9000     ---   (  ---  )   1734.47 (   1.12)     ---   
10000 10000     ---   (  ---  )   1866.15 (   1.43)     ---   
12000 12000     ---   (  ---  )   2075.12 (   2.22)     ---   
14000 14000     ---   (  ---  )   2247.80 (   3.26)     ---   
16000 16000     ---   (  ---  )   2360.83 (   4.63)     ---   
18000 18000     ---   (  ---  )   2433.14 (   6.39)     ---   
20000 20000     ---   (  ---  )   2501.32 (   8.53)     ---   
numactl --interleave=all ./testing_cgetrf_gpu -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.5.0  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_cgetrf_gpu [options] [-h|--help]

    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      0.80 (   0.00)     ---  
 1000  1000     ---   (  ---  )    170.63 (   0.02)     ---  
   10    10     ---   (  ---  )      0.06 (   0.00)     ---  
   20    20     ---   (  ---  )      0.31 (   0.00)     ---  
   30    30     ---   (  ---  )      0.94 (   0.00)     ---  
   40    40     ---   (  ---  )      2.02 (   0.00)     ---  
   50    50     ---   (  ---  )      2.85 (   0.00)     ---  
   60    60     ---   (  ---  )      2.22 (   0.00)     ---  
   70    70     ---   (  ---  )      0.57 (   0.00)     ---  
   80    80     ---   (  ---  )      0.86 (   0.00)     ---  
   90    90     ---   (  ---  )      1.11 (   0.00)     ---  
  100   100     ---   (  ---  )      1.49 (   0.00)     ---  
  200   200     ---   (  ---  )      7.60 (   0.00)     ---  
  300   300     ---   (  ---  )     19.35 (   0.00)     ---  
  400   400     ---   (  ---  )     34.59 (   0.00)     ---  
  500   500     ---   (  ---  )     54.02 (   0.01)     ---  
  600   600     ---   (  ---  )     72.22 (   0.01)     ---  
  700   700     ---   (  ---  )     95.06 (   0.01)     ---  
  800   800     ---   (  ---  )    120.81 (   0.01)     ---  
  900   900     ---   (  ---  )    153.90 (   0.01)     ---  
 1000  1000     ---   (  ---  )    211.21 (   0.01)     ---  
 2000  2000     ---   (  ---  )    563.96 (   0.04)     ---  
 3000  3000     ---   (  ---  )   1001.87 (   0.07)     ---  
 4000  4000     ---   (  ---  )   1309.50 (   0.13)     ---  
 5000  5000     ---   (  ---  )   1080.90 (   0.31)     ---  
 6000  6000     ---   (  ---  )   1378.62 (   0.42)     ---  
 7000  7000     ---   (  ---  )   1610.78 (   0.57)     ---  
 8000  8000     ---   (  ---  )   1847.55 (   0.74)     ---  
 9000  9000     ---   (  ---  )   1967.49 (   0.99)     ---  
10000 10000     ---   (  ---  )   2114.02 (   1.26)     ---  
12000 12000     ---   (  ---  )   2345.70 (   1.96)     ---  
14000 14000     ---   (  ---  )   2469.35 (   2.96)     ---  
16000 16000     ---   (  ---  )   2570.63 (   4.25)     ---  
18000 18000     ---   (  ---  )   2630.74 (   5.91)     ---  
20000 20000     ---   (  ---  )   2679.95 (   7.96)     ---  
