Description
While attempting to build torch from master with cutorch with cuda 9.0.103-1 on Ubuntu 16.04 I hit an error with multiple attempts to overload the "==" and "!=" operators.
Below is an example of the error I receive.
lib/THC/CMakeFiles/THC.dir/build.make:4243: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o] Error 1
/pkgbuild/torch/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu(393): error: more than one operator "==" matches these operands:
function "operator==(const __half &, const __half &)"
function "operator==(half, half)"
operand types are: half == half
/pkgbuild/torch/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu(414): error: more than one operator "==" matches these operands:
function "operator==(const __half &, const __half &)"
function "operator==(half, half)"
operand types are: half == half
I was able to track down the two operator overloads.
One is in
https://github.com/torch/cutorch/blob/master/lib/THC/THCTensorTypeUtils.cuh#L176
And the other is in
/usr/local/cuda-9.0/targets/ppc64le-linux/include/cuda_fp16.hpp
The operator in cuda_fp16.hpp
was provided by the cuda package, but only covers the __device__
and not the __host__
. So we still need to overload the "==" for halfs in the __host__
, however, the code currently in cutorch fails on compile time.
It looks like @csarofeen worked on the initial port to cuda9.0 for cutorch. I'm not sure if he can provide some help on what's going on here?
Is there any additional information you need from me? Thanks in advance!!
Activity
csarofeen commentedon Aug 23, 2017
Before you build try
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
dllehr81 commentedon Aug 23, 2017
Hey @csarofeen . That did the trick! On a side note. This has the appearance of disabling the half operators in the cuda code. Will this impact the half variables performance when run on the device?
csarofeen commentedon Aug 23, 2017
It will, for the better.
betterjordache commentedon Sep 8, 2017
@csarofeen this did it for me as well, thank you!
ProGamerGov commentedon Oct 24, 2017
I had the same issue:
Running
./clean.sh
and then using:export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
, before finally running./install.sh
worked!I was using Ubuntu 16.04.
ProGamerGov commentedon Oct 24, 2017
@csarofeen If it's better to disable the half operators, then what are they used for? Why are they included in the cuda code? And what kind of performance boost are we talking about here?
csarofeen commentedon Oct 24, 2017
Cuda 9 added half operators in the cuda half header. Half operations in torch predate that so they already existed in torch. This keeps the half definition from the cuda header, while not compiling the operators.
ProGamerGov commentedon Oct 25, 2017
@csarofeen Do you have any other performance tips for Cuda and/or cuDNN with Torch7?
Because I've noticed that Cuda 9.0 and cuDNN v7 have even worse performance than Cuda 8.0 and cuDNN v5: jcjohnson/neural-style#429
sfzyk commentedon Nov 15, 2017
same issue. but export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" didn't work for me?? how to solve that?
csarofeen commentedon Nov 16, 2017
@sfzyk Could you please explain all steps you took to install CUDA, NCCL, cuDNN, and pytorch and paste here some of the output from the error? It is very hard to assist the only information provided is "didn't work".
38 remaining items