Skip to content

NesterovSolverTest/2 keeps failing in runtest [Solved] #4229

@Deepak-

Description

@Deepak-

I am trying to compile Caffe but make runtest gives me two failures:

[----------] Global test environment tear-down
[==========] 2009 tests from 269 test cases ran. (716073 ms total)
[  PASSED  ] 2007 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverythingShare, where TypeParam = caffe::GPUDevice<float>
[  FAILED  ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice<float>

This is the exact error I am getting:

src/caffe/test/test_gradient_based_solver.cpp:370: Failure
The difference between expected_updated_weight and solver_updated_weight is 1.1920928955078125e-07, which exceeds error_margin, where
expected_updated_weight evaluates to 9.6857547760009766e-06,
solver_updated_weight evaluates to 9.8049640655517578e-06, and
error_margin evaluates to 1.0000000116860974e-07.
[  FAILED  ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverythingShare, where TypeParam = caffe::GPUDevice<float> (8073 ms)
[ RUN      ] NesterovSolverTest/2.TestLeastSquaresUpdateWithEverythingAccumShare
[       OK ] NesterovSolverTest/2.TestLeastSquaresUpdateWithEverythingAccumShare (28 ms)
[ RUN      ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything
src/caffe/test/test_gradient_based_solver.cpp:370: Failure
The difference between expected_updated_weight and solver_updated_weight is 1.1920928955078125e-07, which exceeds error_margin, where
expected_updated_weight evaluates to 9.6857547760009766e-06,
solver_updated_weight evaluates to 9.8049640655517578e-06, and
error_margin evaluates to 1.0000000116860974e-07.
[  FAILED  ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice<float> (7338 ms)


My system:

  • CentOS 7.2
  • multiple GPUs (GeForce GTX TITAN X used during runtest)
  • CUDA 7.5
  • cuDNN 4

This is probably not related, but immediately after I executeruntest, I get this printed around 80 times:
find: /usr/local/anaconda2/lib/liblzma.so.5: no version information available (required by /lib64/libselinux.so.1)

`

Activity

Deepak-

Deepak- commented on May 28, 2016

@Deepak-
Author

I found the solution here. Before make runtest I did:
export CUDA_VISIBLE_DEVICES=0

Apparently it had to do with multiple GPUs.

changed the title [-] NesterovSolverTest/2 keeps failing in runtest[/-] [+] NesterovSolverTest/2 keeps failing in runtest [Solved][/+] on May 28, 2016
christopher5106

christopher5106 commented on Jul 15, 2016

@christopher5106

Right, thanks a lot !!

agarwal-ayushi

agarwal-ayushi commented on Nov 18, 2017

@agarwal-ayushi

Hello,

I was recently trying to Train ImageNet dataset in Caffe. My system configuration is:
Cuda version - CUDA 8.0
CuDNN - 7.0
GPU - GTX 1080 (4 GPU's)

When I try to do make runtest with export CUDA_VISIBLE_DEVICES=0, it fails with the above reported error however it passes with all the other 3 GPU's i.e. export CUDA_VISIBLE_DEVICES=1,2,3
While training Imagenet, I get memory allocation error when I run it on GPU 0. I get no error when I train on GPU 1, 2, 3.
All the three GPU's are exactly same in configuration.
Could someone tell me what could be the issue?

-Ayushi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @Deepak-@christopher5106@agarwal-ayushi

        Issue actions

          NesterovSolverTest/2 keeps failing in runtest [Solved] · Issue #4229 · BVLC/caffe