Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NesterovSolverTest/2 keeps failing in runtest [Solved] #4229

Closed
Deepak- opened this issue May 28, 2016 · 3 comments
Closed

NesterovSolverTest/2 keeps failing in runtest [Solved] #4229

Deepak- opened this issue May 28, 2016 · 3 comments

Comments

@Deepak-
Copy link

Deepak- commented May 28, 2016

I am trying to compile Caffe but make runtest gives me two failures:

[----------] Global test environment tear-down
[==========] 2009 tests from 269 test cases ran. (716073 ms total)
[  PASSED  ] 2007 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverythingShare, where TypeParam = caffe::GPUDevice<float>
[  FAILED  ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice<float>

This is the exact error I am getting:

src/caffe/test/test_gradient_based_solver.cpp:370: Failure
The difference between expected_updated_weight and solver_updated_weight is 1.1920928955078125e-07, which exceeds error_margin, where
expected_updated_weight evaluates to 9.6857547760009766e-06,
solver_updated_weight evaluates to 9.8049640655517578e-06, and
error_margin evaluates to 1.0000000116860974e-07.
[  FAILED  ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverythingShare, where TypeParam = caffe::GPUDevice<float> (8073 ms)
[ RUN      ] NesterovSolverTest/2.TestLeastSquaresUpdateWithEverythingAccumShare
[       OK ] NesterovSolverTest/2.TestLeastSquaresUpdateWithEverythingAccumShare (28 ms)
[ RUN      ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything
src/caffe/test/test_gradient_based_solver.cpp:370: Failure
The difference between expected_updated_weight and solver_updated_weight is 1.1920928955078125e-07, which exceeds error_margin, where
expected_updated_weight evaluates to 9.6857547760009766e-06,
solver_updated_weight evaluates to 9.8049640655517578e-06, and
error_margin evaluates to 1.0000000116860974e-07.
[  FAILED  ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice<float> (7338 ms)


My system:

  • CentOS 7.2
  • multiple GPUs (GeForce GTX TITAN X used during runtest)
  • CUDA 7.5
  • cuDNN 4

This is probably not related, but immediately after I executeruntest, I get this printed around 80 times:
find: /usr/local/anaconda2/lib/liblzma.so.5: no version information available (required by /lib64/libselinux.so.1)

`

@Deepak-
Copy link
Author

Deepak- commented May 28, 2016

I found the solution here. Before make runtest I did:
export CUDA_VISIBLE_DEVICES=0

Apparently it had to do with multiple GPUs.

@Deepak- Deepak- closed this as completed May 28, 2016
@Deepak- Deepak- changed the title NesterovSolverTest/2 keeps failing in runtest NesterovSolverTest/2 keeps failing in runtest [Solved] May 28, 2016
@christopher5106
Copy link

Right, thanks a lot !!

@agarwal-ayushi
Copy link

Hello,

I was recently trying to Train ImageNet dataset in Caffe. My system configuration is:
Cuda version - CUDA 8.0
CuDNN - 7.0
GPU - GTX 1080 (4 GPU's)

When I try to do make runtest with export CUDA_VISIBLE_DEVICES=0, it fails with the above reported error however it passes with all the other 3 GPU's i.e. export CUDA_VISIBLE_DEVICES=1,2,3
While training Imagenet, I get memory allocation error when I run it on GPU 0. I get no error when I train on GPU 1, 2, 3.
All the three GPU's are exactly same in configuration.
Could someone tell me what could be the issue?

-Ayushi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants