-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Closed
Description
I am trying to compile Caffe but make runtest
gives me two failures:
[----------] Global test environment tear-down
[==========] 2009 tests from 269 test cases ran. (716073 ms total)
[ PASSED ] 2007 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverythingShare, where TypeParam = caffe::GPUDevice<float>
[ FAILED ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice<float>
This is the exact error I am getting:
src/caffe/test/test_gradient_based_solver.cpp:370: Failure
The difference between expected_updated_weight and solver_updated_weight is 1.1920928955078125e-07, which exceeds error_margin, where
expected_updated_weight evaluates to 9.6857547760009766e-06,
solver_updated_weight evaluates to 9.8049640655517578e-06, and
error_margin evaluates to 1.0000000116860974e-07.
[ FAILED ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverythingShare, where TypeParam = caffe::GPUDevice<float> (8073 ms)
[ RUN ] NesterovSolverTest/2.TestLeastSquaresUpdateWithEverythingAccumShare
[ OK ] NesterovSolverTest/2.TestLeastSquaresUpdateWithEverythingAccumShare (28 ms)
[ RUN ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything
src/caffe/test/test_gradient_based_solver.cpp:370: Failure
The difference between expected_updated_weight and solver_updated_weight is 1.1920928955078125e-07, which exceeds error_margin, where
expected_updated_weight evaluates to 9.6857547760009766e-06,
solver_updated_weight evaluates to 9.8049640655517578e-06, and
error_margin evaluates to 1.0000000116860974e-07.
[ FAILED ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice<float> (7338 ms)
My system:
- CentOS 7.2
- multiple GPUs (GeForce GTX TITAN X used during
runtest
) - CUDA 7.5
- cuDNN 4
This is probably not related, but immediately after I executeruntest
, I get this printed around 80 times:
find: /usr/local/anaconda2/lib/liblzma.so.5: no version information available (required by /lib64/libselinux.so.1)
`
Activity
Deepak- commentedon May 28, 2016
I found the solution here. Before
make runtest
I did:export CUDA_VISIBLE_DEVICES=0
Apparently it had to do with multiple GPUs.
[-] NesterovSolverTest/2 keeps failing in runtest[/-][+] NesterovSolverTest/2 keeps failing in runtest [Solved][/+]christopher5106 commentedon Jul 15, 2016
Right, thanks a lot !!
agarwal-ayushi commentedon Nov 18, 2017
Hello,
I was recently trying to Train ImageNet dataset in Caffe. My system configuration is:
Cuda version - CUDA 8.0
CuDNN - 7.0
GPU - GTX 1080 (4 GPU's)
When I try to do make runtest with export CUDA_VISIBLE_DEVICES=0, it fails with the above reported error however it passes with all the other 3 GPU's i.e. export CUDA_VISIBLE_DEVICES=1,2,3
While training Imagenet, I get memory allocation error when I run it on GPU 0. I get no error when I train on GPU 1, 2, 3.
All the three GPU's are exactly same in configuration.
Could someone tell me what could be the issue?
-Ayushi